Synthesizing and manipulating 2048x1024 images with conditional GANs

Overview





pix2pixHD

Project | Youtube | Paper

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translation. It can be used for turning semantic label maps into photo-realistic images or synthesizing portraits from face label maps.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Andrew Tao1, Jan Kautz1, Bryan Catanzaro1
1NVIDIA Corporation, 2UC Berkeley
In CVPR 2018.

Image-to-image translation at 2k/1k resolution

  • Our label-to-streetview results

- Interactive editing results

- Additional streetview results

  • Label-to-face and interactive editing results

  • Our editing interface

Prerequisites

  • Linux or macOS
  • Python 2 or 3
  • NVIDIA GPU (11G memory or larger) + CUDA cuDNN

Getting Started

Installation

pip install dominate
  • Clone this repo:
git clone https://github.com/NVIDIA/pix2pixHD
cd pix2pixHD

Testing

  • A few example Cityscapes test images are included in the datasets folder.
  • Please download the pre-trained Cityscapes model from here (google drive link), and put it under ./checkpoints/label2city_1024p/
  • Test the model (bash ./scripts/test_1024p.sh):
#!./scripts/test_1024p.sh
python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none

The test results will be saved to a html file here: ./results/label2city_1024p/test_latest/index.html.

More example scripts can be found in the scripts directory.

Dataset

  • We use the Cityscapes dataset. To train a model on the full dataset, please download it from the official website (registration required). After downloading, please put it under the datasets folder in the same way the example images are provided.

Training

  • Train a model at 1024 x 512 resolution (bash ./scripts/train_512p.sh):
#!./scripts/train_512p.sh
python train.py --name label2city_512p
  • To view training results, please checkout intermediate results in ./checkpoints/label2city_512p/web/index.html. If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/label2city_512p/logs by adding --tf_log to the training scripts.

Multi-GPU training

  • Train a model using multiple GPUs (bash ./scripts/train_512p_multigpu.sh):
#!./scripts/train_512p_multigpu.sh
python train.py --name label2city_512p --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7

Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion.

Training with Automatic Mixed Precision (AMP) for faster speed

  • To train with mixed precision support, please first install apex from: https://github.com/NVIDIA/apex
  • You can then train the model by adding --fp16. For example,
#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16

In our test case, it trains about 80% faster with AMP on a Volta machine.

Training at full resolution

  • To train the images at full resolution (2048 x 1024) requires a GPU with 24G memory (bash ./scripts/train_1024p_24G.sh), or 16G memory if using mixed precision (AMP).
  • If only GPUs with 12G memory are available, please use the 12G script (bash ./scripts/train_1024p_12G.sh), which will crop the images during training. Performance is not guaranteed using this script.

Training with your own dataset

  • If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity --label_nc N during both training and testing.
  • If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
  • If you don't have instance maps or don't want to use them, please specify --no_instance.
  • The default setting for preprocessing is scale_width, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

More Training/Test Details

  • Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.
  • Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag --no_instance.

Citation

If you find this useful for your research, please use the following.

@inproceedings{wang2018pix2pixHD,
  title={High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs},
  author={Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},  
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

Acknowledgments

This code borrows heavily from pytorch-CycleGAN-and-pix2pix.

[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

DECA: Detailed Expression Capture and Animation (SIGGRAPH2021) input image, aligned reconstruction, animation with various poses & expressions This is

Yao Feng 1.5k Jan 02, 2023
Public Models considered for emotion estimation from EEG

Emotion-EEG Set of models for emotion estimation from EEG. Composed by the combination of two deep-learing models learning together (RNN and CNN) with

Victor Delvigne 21 Dec 23, 2022
Post-Training Quantization for Vision transformers.

PTQ4ViT Post-Training Quantization Framework for Vision Transformers. We use the twin uniform quantization method to reduce the quantization error on

Zhihang Yuan 61 Dec 28, 2022
Latex code for making neural networks diagrams

PlotNeuralNet Latex code for drawing neural networks for reports and presentation. Have a look into examples to see how they are made. Additionally, l

Haris Iqbal 18.6k Jan 01, 2023
A collection of educational notebooks on multi-view geometry and computer vision.

Multiview notebooks This is a collection of educational notebooks on multi-view geometry and computer vision. Subjects covered in these notebooks incl

Max 65 Dec 09, 2022
Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

Luke Melas-Kyriazi 61 Oct 17, 2022
Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning" This is the code for the paper Solving Graph-based Public Goo

Victor-Alexandru Darvariu 3 Dec 05, 2022
MT-GAN-PyTorch - PyTorch Implementation of Learning to Transfer: Unsupervised Domain Translation via Meta-Learning

MT-GAN-PyTorch PyTorch Implementation of AAAI-2020 Paper "Learning to Transfer: Unsupervised Domain Translation via Meta-Learning" Dependency: Python

29 Oct 19, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 04, 2023
paper list in the area of reinforcenment learning for recommendation systems

paper list in the area of reinforcenment learning for recommendation systems

HenryZhao 23 Jun 09, 2022
SCALoss: Side and Corner Aligned Loss for Bounding Box Regression (AAAI2022).

SCALoss PyTorch implementation of the paper "SCALoss: Side and Corner Aligned Loss for Bounding Box Regression" (AAAI 2022). Introduction IoU-based lo

TuZheng 20 Sep 07, 2022
TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Training CIFAR-10 with TensorFlow2(TF2) TensorFlow2 Classification Model Zoo. I'm playing with TensorFlow2 on the CIFAR-10 dataset. Architectures LeNe

Chia-Hung Yuan 16 Sep 27, 2022
PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)

Spatially Consistent Representation Learning (CVPR'21) Official PyTorch implementation of Spatially Consistent Representation Learning (SCRL). This re

Kakao Brain 102 Nov 03, 2022
Dense Prediction Transformers

Vision Transformers for Dense Prediction This repository contains code and models for our paper: Vision Transformers for Dense Prediction René Ranftl,

Intelligent Systems Lab Org 1.3k Jan 02, 2023
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Philipjhc 53 Dec 27, 2022
Compute FID scores with PyTorch.

FID score for PyTorch This is a port of the official implementation of Fréchet Inception Distance to PyTorch. See https://github.com/bioinf-jku/TTUR f

2.1k Jan 06, 2023
SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

SalFBNet This repository includes Pytorch implementation for the following paper: SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolu

12 Aug 12, 2022