Semantic Image Synthesis with SPADE

Related tags

Deep LearningSPADE
Overview

License CC BY-NC-SA 4.0 Python 3.6

Semantic Image Synthesis with SPADE

GauGAN demo

New implementation available at imaginaire repository

We have a reimplementation of the SPADE method that is more performant. It is avaiable at Imaginaire

Project page | Paper | Online Interactive Demo of GauGAN | GTC 2019 demo | Youtube Demo of GauGAN

Semantic Image Synthesis with Spatially-Adaptive Normalization.
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu.
In CVPR 2019 (Oral).

License

Copyright (C) 2019 NVIDIA Corporation.

All rights reserved. Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International)

The code is released for academic research use only. For commercial use or business inquiries, please contact [email protected].

For press and other inquiries, please contact Hector Marinez

Installation

Clone this repo.

git clone https://github.com/NVlabs/SPADE.git
cd SPADE/

This code requires PyTorch 1.0 and python 3+. Please install dependencies by

pip install -r requirements.txt

This code also requires the Synchronized-BatchNorm-PyTorch rep.

cd models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../

To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs.

Dataset Preparation

For COCO-Stuff, Cityscapes or ADE20K, the datasets must be downloaded beforehand. Please download them on the respective webpages. In the case of COCO-stuff, we put a few sample images in this code repo.

Preparing COCO-Stuff Dataset. The dataset can be downloaded here. In particular, you will need to download train2017.zip, val2017.zip, stuffthingmaps_trainval2017.zip, and annotations_trainval2017.zip. The images, labels, and instance maps should be arranged in the same directory structure as in datasets/coco_stuff/. In particular, we used an instance map that combines both the boundaries of "things instance map" and "stuff label map". To do this, we used a simple script datasets/coco_generate_instance_map.py. Please install pycocotools using pip install pycocotools and refer to the script to generate instance maps.

Preparing ADE20K Dataset. The dataset can be downloaded here, which is from MIT Scene Parsing BenchMark. After unzipping the datgaset, put the jpg image files ADEChallengeData2016/images/ and png label files ADEChallengeData2016/annotatoins/ in the same directory.

There are different modes to load images by specifying --preprocess_mode along with --load_size. --crop_size. There are options such as resize_and_crop, which resizes the images into square images of side length load_size and randomly crops to crop_size. scale_shortside_and_crop scales the image to have a short side of length load_size and crops to crop_size x crop_size square. To see all modes, please use python train.py --help and take a look at data/base_dataset.py. By default at the training phase, the images are randomly flipped horizontally. To prevent this use --no_flip.

Generating Images Using Pretrained Model

Once the dataset is ready, the result images can be generated using pretrained models.

  1. Download the tar of the pretrained models from the Google Drive Folder, save it in 'checkpoints/', and run

    cd checkpoints
    tar xvf checkpoints.tar.gz
    cd ../
    
  2. Generate images using the pretrained model.

    python test.py --name [type]_pretrained --dataset_mode [dataset] --dataroot [path_to_dataset]

    [type]_pretrained is the directory name of the checkpoint file downloaded in Step 1, which should be one of coco_pretrained, ade20k_pretrained, and cityscapes_pretrained. [dataset] can be one of coco, ade20k, and cityscapes, and [path_to_dataset], is the path to the dataset. If you are running on CPU mode, append --gpu_ids -1.

  3. The outputs images are stored at ./results/[type]_pretrained/ by default. You can view them using the autogenerated HTML file in the directory.

Generating Landscape Image using GauGAN

In the paper and the demo video, we showed GauGAN, our interactive app that generates realistic landscape images from the layout users draw. The model was trained on landscape images scraped from Flickr.com. We released an online demo that has the same features. Please visit https://www.nvidia.com/en-us/research/ai-playground/. The model weights are not released.

Training New Models

New models can be trained with the following commands.

  1. Prepare dataset. To train on the datasets shown in the paper, you can download the datasets and use --dataset_mode option, which will choose which subclass of BaseDataset is loaded. For custom datasets, the easiest way is to use ./data/custom_dataset.py by specifying the option --dataset_mode custom, along with --label_dir [path_to_labels] --image_dir [path_to_images]. You also need to specify options such as --label_nc for the number of label classes in the dataset, --contain_dontcare_label to specify whether it has an unknown label, or --no_instance to denote the dataset doesn't have instance maps.

  2. Train.

# To train on the Facades or COCO dataset, for example.
python train.py --name [experiment_name] --dataset_mode facades --dataroot [path_to_facades_dataset]
python train.py --name [experiment_name] --dataset_mode coco --dataroot [path_to_coco_dataset]

# To train on your own custom dataset
python train.py --name [experiment_name] --dataset_mode custom --label_dir [path_to_labels] -- image_dir [path_to_images] --label_nc [num_labels]

There are many options you can specify. Please use python train.py --help. The specified options are printed to the console. To specify the number of GPUs to utilize, use --gpu_ids. If you want to use the second and third GPUs for example, use --gpu_ids 1,2.

To log training, use --tf_log for Tensorboard. The logs are stored at [checkpoints_dir]/[name]/logs.

Testing

Testing is similar to testing pretrained models.

python test.py --name [name_of_experiment] --dataset_mode [dataset_mode] --dataroot [path_to_dataset]

Use --results_dir to specify the output directory. --how_many will specify the maximum number of images to generate. By default, it loads the latest checkpoint. It can be changed using --which_epoch.

Code Structure

  • train.py, test.py: the entry point for training and testing.
  • trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
  • models/pix2pix_model.py: creates the networks, and compute the losses
  • models/networks/: defines the architecture of all models
  • options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
  • data/: defines the class for loading images and label maps.

Options

This code repo contains many options. Some options belong to only one specific model, and some options have different default values depending on other options. To address this, the BaseOption class dynamically loads and sets options depending on what model, network, and datasets are used. This is done by calling the static method modify_commandline_options of various classes. It takes in theparser of argparse package and modifies the list of options. For example, since COCO-stuff dataset contains a special label "unknown", when COCO-stuff dataset is used, it sets --contain_dontcare_label automatically at data/coco_dataset.py. You can take a look at def gather_options() of options/base_options.py, or models/network/__init__.py to get a sense of how this works.

VAE-Style Training with an Encoder For Style Control and Multi-Modal Outputs

To train our model along with an image encoder to enable multi-modal outputs as in Figure 15 of the paper, please use --use_vae. The model will create netE in addition to netG and netD and train with KL-Divergence loss.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{park2019SPADE,
  title={Semantic Image Synthesis with Spatially-Adaptive Normalization},
  author={Park, Taesung and Liu, Ming-Yu and Wang, Ting-Chun and Zhu, Jun-Yan},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

Acknowledgments

This code borrows heavily from pix2pixHD. We thank Jiayuan Mao for his Synchronized Batch Normalization code.

Tool for installing and updating MiSTer cores and other files

MiSTer Downloader This tool installs and updates all the cores and other extra files for your MiSTer. It also updates the menu core, the MiSTer firmwa

72 Dec 24, 2022
Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the Machine Learning 4 Health Workshop

Detection-aided liver lesion segmentation Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the

Image Processing Group - BarcelonaTECH - UPC 96 Oct 26, 2022
EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21)

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21) Citation If y

addisonwang 18 Nov 11, 2022
Image Restoration Using Swin Transformer for VapourSynth

SwinIR SwinIR function for VapourSynth, based on https://github.com/JingyunLiang/SwinIR. Dependencies NumPy PyTorch, preferably with CUDA. Note that t

Holy Wu 11 Jun 19, 2022
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms This repo contains: the HDG implementation (Matlab codes) for 'Analysis and

Lei Wang 5 Oct 22, 2022
Single object tracking and segmentation.

Single/Multiple Object Tracking and Segmentation Codes and comparison of recent single/multiple object tracking and segmentation. News 💥 AutoMatch is

ZP ZHANG 385 Jan 02, 2023
ICLR 2021, Fair Mixup: Fairness via Interpolation

Fair Mixup: Fairness via Interpolation Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predicti

Ching-Yao Chuang 49 Nov 22, 2022
PyTorch implementation of neural style randomization for data augmentation

README Augment training images for deep neural networks by randomizing their visual style, as described in our paper: https://arxiv.org/abs/1809.05375

84 Nov 23, 2022
Dahua Camera and Doorbell Home Assistant Integration

Home Assistant Dahua Integration The Dahua Home Assistant integration allows you to integrate your Dahua cameras and doorbells in Home Assistant. It's

Ronnie 216 Dec 26, 2022
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth, in ICCV 2021 (oral)

RINDNet RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth Mengyang Pu, Yaping Huang, Qingji Guan and Haibin Lin

Mengyang Pu 75 Dec 15, 2022
PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The official PyTorch implementation of Neural View S

Angtian Wang 20 Oct 09, 2022
This repository contains a toolkit for collecting, labeling and tracking object keypoints

This repository contains a toolkit for collecting, labeling and tracking object keypoints. Object keypoints are semantic points in an object's coordinate frame.

ETHZ ASL 13 Dec 12, 2022
Setup and customize deep learning environment in seconds.

Deepo is a series of Docker images that allows you to quickly set up your deep learning research environment supports almost all commonly used deep le

Ming 6.3k Jan 06, 2023
A setup script to generate ITK Python Wheels

ITK Python Package This project provides a setup.py script to build ITK Python binary packages and infrastructure to build ITK external module Python

Insight Software Consortium 59 Dec 14, 2022
[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

GenForce: May Generative Force Be with You 148 Dec 09, 2022
sense-py-AnishaBaishya created by GitHub Classroom

Compute Statistics Here we compute statistics for a bunch of numbers. This project uses the unittest framework to test functionality. Pass the tests T

1 Oct 21, 2021
The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

This repository is the official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness. Requirements pip install -r requi

Jie Ren 17 Dec 12, 2022
[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

VITA 40 Dec 30, 2022
MoveNet Single Pose on DepthAI

MoveNet Single Pose tracking on DepthAI Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...). A convolutional neural netwo

64 Dec 29, 2022