CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Last update: Nov 04, 2022

Overview

Diverse Structure Inpainting

ArXiv | Papar | Supplementary Material | BibTex

This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".

If our method is useful for your research, please consider citing.

Introduction

(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.

Places2 Results

Results on the Places2 validation set using the center-mask Places2 model.

CelebA-HQ Results

Results on one CelebA-HQ test image with different holes using the random-mask CelebA-HQ model.

Installation

This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04

Clone this repository：

git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git

Datasets

CelebA-HQ: the high-resolution face images from Growing GANs. 24183 images for training, 2993 images for validation and 2824 images for testing.
Places2: the challenge data from 365 scene categories. 8 Million images for training, 36K images for validation and 328K images for testing.
ImageNet: the data from 1000 natural categories. 1 Million images for training and 50K images for validation.

Training

Collect the dataset. For CelebA-HQ, we collect the 1024x1024 version. For Places2 and ImageNet, we collect the original version.
Prepare the file list. Collect the path of each image and make a file, where each line is a path (end with a carriage return except the last line).
Modify checkpoints_dir, dataset, train_flist and valid_flist arguments in train_vqvae.py, train_structure_generator.py and train_texture_generator.py.
Modify data/data_loader.py according to the dataset. For CelebA-HQ, we resize each image to 266x266 and randomly crop a 256x256. For Places2 and ImageNet, we randomly crop a 256x256
Run python train_vqvae.py to train VQ-VAE.
Modify vqvae_network_dir argument in train_structure_generator.py and train_texture_generator.py based on the path of pre-trained VQ-VAE.
Modify the mask setting arguments in train_structure_generator.py and train_texture_generator.py to choose center mask or random mask.
Run python train_structure_generator.py to train the structure generator.
Run python train_texture_generator.py to train the texture generator.
Modify structure_generator_dir and texture_generator_dir arguments in save_full_model.py based on the paths of pre-trained structure generator and texture generator.
Run python save_full_model.py to save the whole model.

Testing

Collect the testing set. For CelebA-HQ, we resize each image to 256x256. For Places2 and ImageNet, we crop a center 256x256.
Collect the corresponding mask set (2D grayscale, 0 indicates the known region, 255 indicates the missing region).
Prepare the img file list and the mask file list as training.
Modify checkpoints_dir, dataset, img_flist and mask_flist arguments in test.py.
Download the pre-trained model and put model.ckpt.meta, model.ckpt.index, model.ckpt.data-00000-of-00001 and checkpoint under model_logs/ directory.
Run python test.py

Pre-trained Models

Download the pre-trained models using the following links and put them under model_logs/ directory.

center_mask model: CelebA-HQ_center | Places2_center | ImageNet_center
random_mask model: CelebA-HQ_random | Places2_random | ImageNet_random

The center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.

Inference Time

One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256×256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Related tags

Overview

Diverse Structure Inpainting

Introduction

Places2 Results

CelebA-HQ Results

Installation

Datasets

Training

Testing

Pre-trained Models

Inference Time

Owner

AoT is a system for automatically generating off-target test harness by using build information.

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

😮The official implementation of "CoNeRF: Controllable Neural Radiance Fields" 😮

An Implementation of Fully Convolutional Networks in Tensorflow.

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

World Models with TensorFlow 2

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Implementation of the state-of-the-art vision transformers with tensorflow

Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Depth-Aware Video Frame Interpolation (CVPR 2019)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

Detection of PCBA defect

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Related tags

Overview

Diverse Structure Inpainting

Introduction

Places2 Results

CelebA-HQ Results

Installation

Datasets

Training

Testing

Pre-trained Models

Inference Time

Owner

AoT is a system for automatically generating off-target test harness by using build information.

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

😮The official implementation of "CoNeRF: Controllable Neural Radiance Fields" 😮

An Implementation of Fully Convolutional Networks in Tensorflow.

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

World Models with TensorFlow 2

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Implementation of the state-of-the-art vision transformers with tensorflow

Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Depth-Aware Video Frame Interpolation (CVPR 2019)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

Detection of PCBA defect

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI