[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Last update: Jan 02, 2023

Overview

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight)

Demo | Paper

[NEW!] Time to play with our interactive web demo!

Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation.

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, Yan Xu
Tsinghua University and Microsoft Research
arXiv | OpenReview

Overview

This repo is implemented upon and has the same dependencies as the official StyleGAN2 repo. We also provide a Dockerfile for Docker users. This repo currently supports:

Large scale image completion experiments on FFHQ and Places2
Image-to-image translation experiments on edges to photos and COCO-Stuff
Evaluation code of Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS)

Datasets

FFHQ dataset (in TFRecords format) can be downloaded following the StyleGAN2 repo.
Places2 dataset can be downloaded in this website (Places365-Challenge 2016 high-resolution images, training set and validation set). The raw images should be converted into TFRecords using dataset_tools/create_places2.py.

Training

The following script is for training on FFHQ. It will splits 10k images for validation. We recommend using 8 NVIDIA Tesla V100 GPUs for training. Training at 512x512 resolution takes about 1 week.

python run_training.py --data-dir=DATA_DIR --dataset=DATASET --metrics=ids10k --num-gpus=8

The following script is for training on Places2, which has a validation set of 36500 images:

python run_training.py --data-dir=DATA_DIR --dataset=DATASET --metrics=ids36k5 --total-kimg 50000 --num-gpus=8

Evaluation

The following script is for evaluation:

python run_metrics.py --data-dir=DATA_DIR --dataset=DATASET --network=CHECKPOINT_FILE(S) --metrics=METRIC(S) --num-gpus=1

Commonly used metrics are ids10k and ids36k5 (for FFHQ and Places2 respectively), which will compute P-IDS and U-IDS together with FID. By default, masks are generated randomly for evaluation, or you may append the metric name with -h0 ([0.0, 0.2]) to -h4 ([0.8, 1.0]) to specify the range of masked ratio.

Our pre-trained models are available on Google Drive. Below lists our provided pre-trained models:

Model name & URL	Description
co-mod-gan-ffhq-9-025000.pkl	Large scale image completion on FFHQ (512x512)
co-mod-gan-ffhq-10-025000.pkl	Large scale image completion on FFHQ (1024x1024)
co-mod-gan-places2-050000.pkl	Large scale image completion on Places2 (512x512)
co-mod-gan-coco-stuff-025000.pkl	Image-to-image translation on COCO-Stuff (labels to photos) (512x512)
co-mod-gan-edges2shoes-025000.pkl	Image-to-image translation on edges2shoes (256x256)
co-mod-gan-edges2handbags-025000.pkl	Image-to-image translation on edges2handbags (256x256)

Use the following script to run the interactive demo locally:

python run_demo.py -d DATA_DIR/DATASET -c CHECKPOINT_FILE(S)

Citation

If you find this code helpful, please cite our paper:

@inproceedings{zhao2021comodgan,
  title={Large Scale Image Completion via Co-Modulated Generative Adversarial Networks},
  author={Zhao, Shengyu and Cui, Jonathan and Sheng, Yilun and Dong, Yue and Liang, Xiao and Chang, Eric I and Xu, Yan},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2021}
}

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Related tags

Overview

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight)

Demo | Paper

Overview

Datasets

Training

Evaluation

Citation

Owner

Shengyu Zhao

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

QR2Pass-project - A proof of concept for an alternative (passwordless) authentication system to a web server

Learning Modified Indicator Functions for Surface Reconstruction

Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

Put blind watermark into a text with python

This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Repository of best practices for deep learning in Julia, inspired by fastai

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks