A curated list of papers, code and resources pertaining to image composition

Overview

Awesome Image Composition Awesome

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

Contributing

Contributions are welcome. If you wish to contribute, feel free to send a pull request. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a pull request.

Table of Contents

Surveys

  • Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang: "Making Images Real Again: A Comprehensive Survey on Deep Image Composition." arXiv preprint arXiv:2106.14490 (2021). [arXiv]

Papers

Image blending

  • Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang: "GP-GAN: Towards Realistic High-Resolution Image Blending." ACM MM (2019) [arXiv] [code]
  • Lingzhi Zhang, Tarmily Wen, Jianbo Shi: "Deep Image Blending." WACV (2020) [pdf] [arXiv] [code]

Image harmonization

  • Jun Ling, Han Xue, Li Song, Rong Xie, Xiao Gu: "Region-Aware Adaptive Instance Normalization for Image Harmonization." CVPR (2021) [pdf] [supp] [arXiv] [code].
  • Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng: "Intrinsic Image Harmonization." CVPR (2021) [pdf] [supp] [code].
  • Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang: "BargainNet: Background-Guided Domain Translation for Image Harmonization." ICME (2021) [arXiv] [code].
  • Konstantin Sofiiuk, Polina Popenova, Anton Konushin: "Foreground-aware Semantic Representations for Image Harmonization." WACV (2021) [pdf] [supp] [arXiv] [code]
  • Guoqing Hao, Satoshi Iizuka, Kazuhiro Fukui: "Image Harmonization with Attention-based Deep Feature Modulation." BMVC (2020) [pdf] [supp] [code]
  • Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang: "DoveNet: Deep Image Harmonization via Domain Verification." CVPR (2020) [pdf] [supp] [arXiv] [code].
  • Xiaodong Cun, Chi-Man Pun: "Improving the Harmony of the Composite Image by Spatial-Separated Attention Module." IEEE Trans. Image Process. 29: 4759-4771 (2020) [pdf] [arXiv] [code]
  • Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang: "Deep Image Harmonization." CVPR (2017) [pdf] [supp] [arXiv] [code]

Shadow generation

  • Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, Chunxia Xiao: "ARshadowGAN: Shadow generative adversarial network for augmented reality in single light scenes." CVPR (2020) [pdf] [code].

  • Shuyang Zhang, Runze Liang, Miao Wang: "ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks." Computational Visual Media (2019) [pdf].

  • Fangneng Zhan, Shijian Lu, Changgong Zhang, Feiying Ma, Xuansong Xie: "Adversarial Image Composition with Auxiliary Illumination." ACCV (2020) [pdf].

Object placement and spatial transformation

  • Lingzhi Zhang, Tarmily Wen, Jie Min, Jiancong Wang, David Han, Jianbo Shi: "Learning Object Placement by Inpainting for Compositional Data Augmentation" ECCV (2020) [pdf]

  • Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell: "Compositional GAN: Learning Image-Conditional Binary Composition" International Journal of Computer Vision (2020) [arXiv] [code]

  • Song-Hai Zhang, Zhengping Zhou, Bin Liu, Xi Dong, Peter Hall: "What and Where: A Context-based Recommendation System for Object Insertion" Computational Visual Media (2020) [arXiv]

  • Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari: "Learning to Generate Synthetic Data via Compositing" CVPR (2019) [arXiv]

  • Haoshu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yonglu Li, Cewu Lu: "InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting" ICCV (2019) [arXiv] [code]

  • Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, Simon Lucey: "ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing" CVPR (2018) [arXiv] [code]

  • Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz: "Context-Aware Synthesis and Placement of Object Instances" NeurIPS (2018) [arXiv] [code]

  • Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes: "Where and Who? Automatic Semantic-Aware Person Composition" WACV (2018) [arXiv][code]

  • Tal Remez, Jonathan Huang, Matthew Brown: "learning to segment via cut-and-paste" ECCV (2018) [arXiv] [code]

Occlusion

  • Samaneh Azadi, Deepak Pathak, Sayna Ebrahimi, Trevor Darrell: "Compositional GAN: Learning Image-Conditional Binary Composition." IJCV (2020) [arXiv] [code]
  • Fangneng Zhan, Jiaxing Huang, Shijian Lu, "Hierarchy Composition GAN for High-fidelity Image Synthesis." Transactions on cybernetics (2021) [arXiv]

Datasets

  • iHarmony4 (image harmonization): It contains four subdatasets: HCOCO, HAdobe5k, HFlickr, Hday2night, with a total of 73,146 pairs of unharmonized images and harmonized images. [pdf] [link]
  • GMSDataset (image harmonization): It contains 183 images with image resolution of 1940*1440. It consists of 16 different objects and for each object, one source image and 11 target images in different background scenes and illumination conditions are captured. [pdf] [link] (access code: ekn2)
  • HVIDIT (image harmonization): A dataset built upon VIDIT (Virtual Image Dataset for Illumination Transfer) dataset for image harmonization. It contains 3007 images of 276 scenes for training and 329 images of 24 scenes for testing. [pdf] [link]
  • RHHarmony (image harmonization): A rendered image harmonization dataset, which contains 15000 ground-truth rendered images and has the potential to generate 135000 composite rendered images. [pdf] [link]
  • Shadow-AR (shadow generation): It contains 3,000 quintuples, Each quintuple consists of 5 images 640×480 resolution: a synthetic image without the virtual object shadow and its corresponding image containing the virtual object shadow, a mask of the virtual object, a labeled real-world shadow matting and its corresponding labeled occluder. [pdf] [link]
  • DESOBA (shadow generation): It contains 840 training images with totally 2,999 object-shadow pairs and 160 test images with totally 624 object-shadow pairs. [pdf] [link]
  • OPA (object placement): It contains 62,074 training images and 11,396 test images, in which the foregrounds/backgrounds in training set and test set have no overlap. The training (resp., test) set contains 21,351 (resp.,3,566) positive samples and 40,724 (resp., 7,830) negative samples. [pdf] [link]

Other resources

Owner
BCMI
Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University.
BCMI
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 06, 2023
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Ocular is a state-of-the-art historical OCR system.

Ocular Ocular is a state-of-the-art historical OCR system. Its primary features are: Unsupervised learning of unknown fonts: requires only document im

228 Dec 30, 2022
原神风花节自动弹琴辅助

GenshinAutoPlayBalladsofBreeze 原神风花节自动弹琴辅助(已适配1920*1080分辨率) 本程序基于opencv图像识别技术,不存在任何封号。 因为正确率取决于你的cpu性能,10900k都不一定全对。 由于图像识别存在误差,根本无法确定出错时间。更不用说被检测到了。

晓轩 20 Oct 27, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
Geometric Augmentation for Text Image

Text Image Augmentation A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Ne

Canjie Luo 440 Jan 05, 2023
Deskewing images with slanted content

skew_correction De-skewing images with slanted content by finding the deviation using Canny Edge Detection. To Run: In python 3.6, from deskew import

13 Aug 27, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
One Metrics Library to Rule Them All!

onemetric Installation Install onemetric from PyPI (recommended): pip install onemetric Install onemetric from the GitHub source: git clone https://gi

Piotr Skalski 49 Jan 03, 2023
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Paint Opencv 📷 This project is basically to draw lines with your hand, using python, opencv, mediapipe. Screenshoots 📱 Tools ⚙️ Python Opencv Mediap

Williams Ismael Bobadilla Torres 3 Nov 17, 2021
Super Mario Game With Python

Super_Mario Hello all this is a simple python program which tries to use our body as a controller for the super mario game Here I have used media pipe

Adarsh Badagala 219 Nov 25, 2022
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
A python programusing Tkinter graphics library to randomize questions and answers contained in text files

RaffleOfQuestions Um programa simples em python, utilizando a biblioteca gráfica Tkinter para randomizar perguntas e respostas contidas em arquivos de

Gabriel Ferreira Rodrigues 1 Dec 16, 2021
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Jan 05, 2023
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

OCRopus 285 Dec 08, 2022
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
A Python wrapper for the tesseract-ocr API

tesserocr A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). tesserocr integrates directly with

Fayez 1.7k Dec 31, 2022