CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Overview

Diverse Structure Inpainting

ArXiv | Papar | Supplementary Material | BibTex

This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".

If our method is useful for your research, please consider citing.

Introduction

(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.

Places2 Results

Results on the Places2 validation set using the center-mask Places2 model.

CelebA-HQ Results

Results on one CelebA-HQ test image with different holes using the random-mask CelebA-HQ model.

Installation

This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04

Clone this repository:

git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git

Datasets

  • CelebA-HQ: the high-resolution face images from Growing GANs. 24183 images for training, 2993 images for validation and 2824 images for testing.
  • Places2: the challenge data from 365 scene categories. 8 Million images for training, 36K images for validation and 328K images for testing.
  • ImageNet: the data from 1000 natural categories. 1 Million images for training and 50K images for validation.

Training

  • Collect the dataset. For CelebA-HQ, we collect the 1024x1024 version. For Places2 and ImageNet, we collect the original version.
  • Prepare the file list. Collect the path of each image and make a file, where each line is a path (end with a carriage return except the last line).
  • Modify checkpoints_dir, dataset, train_flist and valid_flist arguments in train_vqvae.py, train_structure_generator.py and train_texture_generator.py.
  • Modify data/data_loader.py according to the dataset. For CelebA-HQ, we resize each image to 266x266 and randomly crop a 256x256. For Places2 and ImageNet, we randomly crop a 256x256
  • Run python train_vqvae.py to train VQ-VAE.
  • Modify vqvae_network_dir argument in train_structure_generator.py and train_texture_generator.py based on the path of pre-trained VQ-VAE.
  • Modify the mask setting arguments in train_structure_generator.py and train_texture_generator.py to choose center mask or random mask.
  • Run python train_structure_generator.py to train the structure generator.
  • Run python train_texture_generator.py to train the texture generator.
  • Modify structure_generator_dir and texture_generator_dir arguments in save_full_model.py based on the paths of pre-trained structure generator and texture generator.
  • Run python save_full_model.py to save the whole model.

Testing

  • Collect the testing set. For CelebA-HQ, we resize each image to 256x256. For Places2 and ImageNet, we crop a center 256x256.
  • Collect the corresponding mask set (2D grayscale, 0 indicates the known region, 255 indicates the missing region).
  • Prepare the img file list and the mask file list as training.
  • Modify checkpoints_dir, dataset, img_flist and mask_flist arguments in test.py.
  • Download the pre-trained model and put model.ckpt.meta, model.ckpt.index, model.ckpt.data-00000-of-00001 and checkpoint under model_logs/ directory.
  • Run python test.py

Pre-trained Models

Download the pre-trained models using the following links and put them under model_logs/ directory.

The center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.

Inference Time

One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256×256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.

Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
A PyTorch based deep learning library for drug pair scoring.

Documentation | External Resources | Datasets | Examples ChemicalX is a deep learning library for drug-drug interaction, polypharmacy side effect and

AstraZeneca 597 Dec 30, 2022
Python package for dynamic system estimation of time series

PyDSE Toolset for Dynamic System Estimation for time series inspired by DSE. It is in a beta state and only includes ARMA models right now. Documentat

Blue Yonder GmbH 40 Oct 07, 2022
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

Brent Griffin 172 Dec 21, 2022
Instance Semantic Segmentation List

Instance Semantic Segmentation List This repository contains lists of state-or-art instance semantic segmentation works. Papers and resources are list

bighead 87 Mar 06, 2022
Utilities and information for the signals.numer.ai tournament

dsignals Utilities and information for the signals.numer.ai tournament using eodhistoricaldata.com eodhistoricaldata.com provides excellent historical

Degerhan Usluel 23 Dec 18, 2022
Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

Hand Gesture Volume Controller Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out). Code Firstly I have created a

Tejas Prajapati 16 Sep 11, 2021
Neural Motion Learner With Python

Neural Motion Learner Introduction This work is to extract skeletal structure from volumetric observations and to learn motion dynamics from the detec

Jinseok Bae 14 Nov 28, 2022
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Andy Brock 2.6k Jan 02, 2023
Feature extraction made simple with torchextractor

torchextractor: PyTorch Intermediate Feature Extraction Introduction Too many times some model definitions get remorselessly copy-pasted just because

Antoine Broyelle 89 Oct 31, 2022
NeRD: Neural Reflectance Decomposition from Image Collections

NeRD: Neural Reflectance Decomposition from Image Collections Project Page | Video | Paper | Dataset Implementation for NeRD. A novel method which dec

Computergraphics (University of Tübingen) 195 Dec 29, 2022
Standalone pre-training recipe with JAX+Flax

Sabertooth Sabertooth is standalone pre-training recipe based on JAX+Flax, with data pipelines implemented in Rust. It runs on CPU, GPU, and/or TPU, b

Nikita Kitaev 26 Nov 28, 2022
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
Google AI Open Images - Object Detection Track: Open Solution

Google AI Open Images - Object Detection Track: Open Solution This is an open solution to the Google AI Open Images - Object Detection Track 😃 More c

minerva.ml 46 Jun 22, 2022
RoIAlign & crop_and_resize for PyTorch

RoIAlign for PyTorch This is a PyTorch version of RoIAlign. This implementation is based on crop_and_resize and supports both forward and backward on

Long Chen 530 Jan 07, 2023
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering

UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering This repository holds all the code and data for our recent work on

Mohamed El Banani 118 Dec 06, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 08, 2023