Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

Related tags

Deep LearningINADE
Overview

Diverse Semantic Image Synthesis via Probability Distribution Modeling (CVPR 2021)

Architecture

Paper

Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, [Bin Liu], Gang Hua, Nenghai Yu

Abstract

Semantic image synthesis, translating semantic layouts to photo-realistic images, is a one-to-many mapping problem. Though impressive progress has been recently made, diverse semantic synthesis that can efficiently produce semantic-level multimodal results, still remains a challenge. In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level. We achieve this by modeling class-level conditional modulation parameters as continuous probability distributions instead of discrete values, and sampling per-instance modulation parameters through instance-adaptive stochastic sampling that is consistent across the network. Moreover, we propose prior noise remapping, through linear perturbation parameters encoded from paired references, to facilitate supervised training and exemplar-based instance style control at test time. Extensive experiments on multiple datasets show that our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.

Installation

Clone this repo.

git clone https://github.com/tzt101/INADE.git
cd INADE/

This code requires PyTorch 1.6 and python 3+. Please install dependencies by

pip install -r requirements.txt

Dataset Preparation

The Cityscapes and ADE20K dataset can be downloaded and prepared following SPADE. The CelebAMask-HQ can be downloaded from CelebAMask-HQ, you need to to integrate the separated annotations into an image file (the format like other datasets, e.g. Cityscapes and ADE20K). The DeepFashion can be downloaded from SMIS, and the version with two persons can be downloaded from OneDrive.

To make or reid the instance map, you can use the following commands:

python make_instances.py --path [Path_to_dataset] --dataset [ade20k | cityscapes | celeba | deepfashion]

Generating Images Using Pretrained Model

Once the dataset is ready, the result images can be generated using pretrained models.

  1. Download the pretrained models from the OneDrive, save it in checkpoints/. The structure is as follows:
./checkpoints/
    inade_ade20k/
        best_net_G.pth
        best_net_IE.pth
    inade_celeba/
        best_net_G.pth
        best_net_IE.pth
    inade_cityscapes/
        best_net_G.pth
        best_net_IE.pth
    inade_deepfashion/
        best_net_G.pth
        best_net_IE.pth

The noise_nc is 64 for all pretrained models except which on deepfashion (set to 8). Because we find that it's enough for quality and diversity.

  1. Generate the images on the test dataset.
python test.py --name [model_name] --norm_mode inade --batchSize 1 --gpu_ids 0 --which_epoch best --dataset_mode [dataset] --dataroot [Path_to_dataset]

[model_name] is the directory name of the checkpoint file downloaded in Step 1, such as inade_ade20k and inade_cityscapes. [dataset] can be on of ade20k, celeba, cityscapes and deepfashion. [Path_to_dataset] is the path to the dataset. If you want to use encoder, you can add the another option --use_vae.

Training New Models

You can train your own model with the following command:

# To train CLADE and CLADE-ICPE.
python train.py --name [experiment_name] --dataset_mode [dataset] --norm_mode inade --use_vae --dataroot [Path_to_dataset]

If you want to test the model during the training step, please set --train_eval. By default, the model every 10 epoch will be test in terms of FID. Finally, the model with best FID score will be saved as best_net_G.pth.

Calculate FID

We provide the code to calculate the FID which is based on rpo. We have pre-calculated the distribution of real images (all images are resized to 256×256 except cityscapes is 512×256) in training set of each dataset and saved them in ./datasets/train_mu_si/. You can run the following command:

python fid_score.py [Path_to_real_image] [Path_to_fake_image] --batch-size 1 --gpu 0 --load_np_name [dataset] --resize_size [Size]

The provided [dataset] are: ade20k, celeba, cityscapes, coco and deepfashion. You can save the new dataset by replacing --load_np_name [dataset] with --save_np_name [dataset].

New Useful Options

The new options are as follows:

  • --use_amp: if specified, use AMP training mode.
  • --train_eval: if sepcified, evaluate the model during training.
  • --eval_dims: the default setting is 2048, Dimensionality of Inception features to use.
  • --eval_epoch_freq: the default setting is 10, frequency of calculate fid score at the end of epochs.

Code Structure

  • train.py, test.py: the entry point for training and testing.
  • trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
  • models/pix2pix_model.py: creates the networks, and compute the losses
  • models/networks/: defines the architecture of all models
  • options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
  • data/: defines the class for loading images and label maps.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{tan2021diverse,
  title={Diverse Semantic Image Synthesis via Probability Distribution Modeling},
  author={Tan, Zhentao and Chai, Menglei and Chen, Dongdong and Liao, Jing and Chu, Qi and Liu, Bin and Hua, Gang and Yu, Nenghai},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7962--7971},
  year={2021}
}

Acknowledgments

This code borrows heavily from SPADE.

Gems & Holiday Package Prediction

Predictive_Modelling Gems & Holiday Package Prediction This project is based on 2 cases studies : Gems Price Prediction and Holiday Package prediction

Avnika Mehta 1 Jan 27, 2022
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Thank you for you

Weirui Ye 671 Jan 03, 2023
(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Lifting 2D StyleGAN for 3D-Aware Face Generation Official implementation of paper "Lifting 2D StyleGAN for 3D-Aware Face Generation". Requirements You

Yichun Shi 66 Nov 29, 2022
RLHive: a framework designed to facilitate research in reinforcement learning.

RLHive is a framework designed to facilitate research in reinforcement learning. It provides the components necessary to run a full RL experiment, for both single agent and multi agent environments.

88 Jan 05, 2023
ANEA: Distant Supervision for Low-Resource Named Entity Recognition

ANEA: Distant Supervision for Low-Resource Named Entity Recognition ANEA is a tool to automatically annotate named entities in unlabeled text based on

Saarland University Spoken Language Systems Group 15 Mar 30, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data By Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, W

Taihong Xiao 141 Apr 16, 2021
v objective diffusion inference code for JAX.

v-diffusion-jax v objective diffusion inference code for JAX, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The models

Katherine Crowson 186 Dec 21, 2022
Yolo algorithm for detection + centroid tracker to track vehicles

Vehicle Tracking using Centroid tracker Algorithm used : Yolo algorithm for detection + centroid tracker to track vehicles Backend : opencv and python

6 Dec 21, 2022
DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack DanceTrack is a benchmark for tracking multiple objects in uniform appearance and diverse motion. DanceTrack provides box and identity anno

260 Dec 28, 2022
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.4k Dec 31, 2022
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target i

NanYoMy 13 Oct 09, 2022
Fake videos detection by tracing the source using video hashing retrieval.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos 🎉️ 📜 Directory Introduction VTL Trace Samples and Acc of Hash

56 Dec 22, 2022
[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Listening to Sounds of Silence for Speech Denoising Introduction This is the repository of the "Listening to Sounds of Silence for Speech Denoising" p

Henry Xu 40 Dec 20, 2022
Utility code for use with PyXLL

pyxll-utils There is no need to use this package as of PyXLL 5. All features from this package are now provided by PyXLL. If you were using this packa

PyXLL 10 Dec 18, 2021
An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

Chris Nota 5 Aug 30, 2022
Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

Uncertainty Estimation Methods Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods” Reference If you use this code,

EPFL Machine Learning and Optimization Laboratory 4 Apr 05, 2022
Language Models Can See: Plugging Visual Controls in Text Generation

Language Models Can See: Plugging Visual Controls in Text Generation Authors: Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lin

Yixuan Su 195 Dec 22, 2022
MEND: Model Editing Networks using Gradient Decomposition

MEND: Model Editing Networks using Gradient Decomposition Setup Environment This codebase uses Python 3.7.9. Other versions may work as well. Create a

Eric Mitchell 141 Dec 02, 2022
Save-restricted-v-3 - Save restricted content Bot For telegram

Save restricted content Bot Contact: Telegram A stable telegram bot to get restr

DEVANSH 11 Dec 21, 2022