Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Last update: Dec 16, 2022

Related tags

Deep Learning StackGAN-v2

Overview

StackGAN-v2

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang*, Tao Xu*, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas.

Dependencies

python 2.7

Pytorch

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

tensorboard
python-dateutil
easydict
pandas
torchfile

Data

Download our preprocessed char-CNN-RNN text embeddings for birds and save them to data/

[Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.

Download the birds image data. Extract them to data/birds/
Download ImageNet dataset and extract the images to data/imagenet/
Download LSUN dataset and save the images to data/lsun

Training

Train a StackGAN-v2 model on the bird (CUB) dataset using our preprocessed embeddings:
- python main.py --cfg cfg/birds_3stages.yml --gpu 0
Train a StackGAN-v2 model on the ImageNet dog subset:
- python main.py --cfg cfg/dog_3stages_color.yml --gpu 0
Train a StackGAN-v2 model on the ImageNet cat subset:
- python main.py --cfg cfg/cat_3stages_color.yml --gpu 0
Train a StackGAN-v2 model on the lsun bedroom subset:
- python main.py --cfg cfg/bedroom_3stages_color.yml --gpu 0
Train a StackGAN-v2 model on the lsun church subset:
- python main.py --cfg cfg/church_3stages_color.yml --gpu 0
*.yml files are example configuration files for training/evaluation our models.
If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.

Pretrained Model

StackGAN-v2 for bird. Download and save it to models/ (The inception score for this Model is 4.04±0.05)
StackGAN-v2 for dog. Download and save it to models/ (The inception score for this Model is 9.55±0.11)
StackGAN-v2 for cat. Download and save it to models/
StackGAN-v2 for bedroom. Download and save it to models/
StackGAN-v2 for church. Download and save it to models/

Evaluating

Run python main.py --cfg cfg/eval_birds.yml --gpu 1 to generate samples from captions in birds validation set.
Change the eval_*.yml files to generate images from other pre-trained models.

Examples generated by StackGAN-v2

Tsne visualization of randomly generated birds, dogs, cats, churchs and bedrooms

Citing StackGAN++

If you find StackGAN useful in your research, please consider citing:

@article{Han17stackgan2,
  author    = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
  title     = {StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks},
  journal   = {arXiv: 1710.10916},
  year      = {2017},
}

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

Our follow-up work

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [Supplementary][code]

References

Generative Adversarial Text-to-Image Synthesis Paper Code
Learning Deep Representations of Fine-grained Visual Descriptions Paper Code

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Related tags

Overview

StackGAN-v2

Dependencies

Citing StackGAN++

Owner

Han Zhang

Heart Arrhythmia Classification

ServiceX Transformer that converts flat ROOT ntuples into columnwise data

Deep functional residue identification

Multiple Object Tracking with Yolov5!

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

DP-CL(Continual Learning with Differential Privacy)

QuALITY: Question Answering with Long Input Texts, Yes!

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks

Bagua is a flexible and performant distributed training algorithm development framework.

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

In-place Parallel Super Scalar Samplesort (IPS⁴o)

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

TVNet: Temporal Voting Network for Action Localization

A Self-Supervised Contrastive Learning Framework for Aspect Detection