RP-GAN: Stable GAN Training with Random Projections

Overview

RP-GAN: Stable GAN Training with Random Projections

Interpolated images from our GAN

This repository contains a reference implementation of the algorithm described in the paper:

Behnam Neyshabur, Srinadh Bhojanapalli, and Ayan Chakrabarti, "Stabilizing GAN Training with Multiple Random Projections," arXiv:1705.07831 [cs.LG], 2017.

Pre-trained generator models are not included in the repository due to their size, but are available as binary downloads as part of the release. This code and data is being released for research use. If you use the code in research that results in a publication, we request that you kindly cite the above paper. Please direct any questions to [email protected].

Requirements

The code uses the tensorflow library, and has been tested with versions 0.9 and 0.11 with both Python2 and Python3. You will need a modern GPU for training in a reasonable amount of time, but the sampling code should work on a CPU.

Sampling with Trained Models

We first describe usage of scripts for sampling from trained models. You can use these scripts for models you train yourself, or use the provided pre-trained models.

Pre-trained Models

We provide a number of pre-trained models in the release, corresponding to the experiments in the paper. The parameters of each model (both for training and sampling) are described in .py files the exp/ directory. face1.py describes a face image model trained in the traditional setting with a single discriminator, while faceNN.py are models trained with multiple discriminators each acting on one of NN random low-dimensional projections. face48.py describes the main face model used in our experiments, while dog12.py is the model trained with 12 discriminators on the Imagenet-Canines set. After downloading the trained model archive files, unzip them in the repository root directory. This should create files in sub-directories of models/.

Generating Samples

Use sample.py to generate samples using any of trained models as:

$ ./sample.py expName[,seed] out.png [iteration]

where expName is the name of the experiment file (without the .py extension), and out.png is the file to save the generated samples to. The script accepts optional parameters: seed (default 0) specifies the random seed used to generate the noise vectors provided to the generator, and iteration (default: max iteration available as saved file) specifies which model file to use in case multiple snapshots are available. E.g.,

$ ./sample.py face48 out.png      # Sample from the face48 experiment, using 
                                  # seed 0, and the latest model file.
$ ./sample.py face48,100 out.png  # Sample from the face48 experiment, using
                                  # seed 100, and the latest model file.
$ ./sample.py face1 out.png       # Sample from the single discriminator face
                                  # experiment, and the latest model file.
$ ./sample.py face1 out.png 40000 # Sample from the single discriminator face
                                  # experiment, and the 40k iterations model.
Interpolating in Latent Space

We also provide a script to produce interpolated images like the ones at the top of this page. However, before you can use this script, you need to create a version of the model file that contains the population mean-variance statistics of the activations to be used in batch-norm la(sample.py above uses batch norm statistics which is fine since it is working with a large batch of noise vectors. However, for interpolation, you will typically be working with smaller, more correlated, batches, and therefore should use batch statistics).

To create this version of the model file, use the provided script fixbn.py as:

$ CUDA_VISIBLE_DEVICES= ./fixbn.py expName [iteration]

This will create a second version of the model weights file (with extension .bgmodel.npz instead of .gmodel.npz) that also stores the batch statistics. Like for sample.py, you can provide a second optional argument to specify a specific model snapshot corresponding to an iteration number.

Note that we call the script with CUDA_VISIBLE_DEVICES= to force tensorflow to use the CPU instead of the GPU. This is because we compute these stats over a relatively large batch which typically doesn't fit in GPU memory (and since it's only one forward pass, running time isn't really an issue).

You only need to call fixbn.py once, and after that, you can use the script interp.py to create interpolated samples. The script will generate multiple rows of images, each producing samples from noise vectors interpolated between a pair from left-to-right. The script lets you specify these pairs of noise vectors as IDs:

$ ./interp.py expName[,seed[,iteration]] out.png lid,rid lid,rid ....

The first parameter now has two optional comma-separated arguments beyond the model name for seed and iteration. After this and the output file name, it agrees an arbitrary number of pairs of left-right image IDs, for each row of desired images in the output. These IDs correspond to the number of the image, in reading order, in the output generated by sample.py (with the same seed). For example, to create the images at the top of the page, use:

$ ./interp.py face48 out.png 137,65 146,150 15,138 54,72 38,123 36,93

Training

To train your own model, you will need to create a new model file (say myown.py) in the exp/ directory. See the existing model files for reference. Here is an explanation of some of the key parameters:

  • wts_dir: Directory in which to store model weights. This directory must already exist.
  • imsz: Resolution / Size of the images (will be square color images of size imsz x imsz).
  • lfile: Path to a list file for the images you want to train on, where each line of the file contains a path to an image.
  • crop: Boolean (True or False). Indicates whether the images are already the correct resolution, or need to be cropped. If True, these images will first be resized so that the smaller side matches imsz, and then a random crop along the other dimension will be used for training.

Before you begin training, you will need to create a file called filts.npz which defines the convolutional filters for the random projections. See the filts/ directory for the filters used for the pre-trained models, as well as instructions on a script for creating your own. On

Once you have created the model file and prepared the directory, you can begin training by using the train.py script as:

$ ./train.py myown

where the first parameter is the name of your model file.

We also provide a script for traditional training---baseline_train.py---with a single discriminator acting on the original image. It is used in the same way, except it doesn't require a filts.npz file in the weights directory.


Acknowledgments

This work was supported by the National Science Foundation under award no. IIS-1820693. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors, and do not necessarily reflect the views of the National Science Foundation.

You might also like...
Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

StableNet StableNet is a deep stable learning method for out-of-distribution generalization. This is the official repo for CVPR21 paper "Deep Stable L

This is the official implementation of the paper
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

TeST: Temporal-Stable Thresholding for Semi-supervised Learning
TeST: Temporal-Stable Thresholding for Semi-supervised Learning

TeST: Temporal-Stable Thresholding for Semi-supervised Learning TeST Illustration Semi-supervised learning (SSL) offers an effective method for large-

Simple converter for deploying Stable-Baselines3 model to TFLite and/or Coral

Running SB3 developed agents on TFLite or Coral Introduction I've been using Stable-Baselines3 to train agents against some custom Gyms, some of which

RL agent to play μRTS with Stable-Baselines3
RL agent to play μRTS with Stable-Baselines3

Gym-μRTS with Stable-Baselines3/PyTorch This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS usin

Additional code for Stable-baselines3 to load and upload models from the Hub.

Hugging Face x Stable-baselines3 A library to load and upload Stable-baselines3 models from the Hub. Installation With pip Examples [Todo: add colab t

Self-driving car env with PPO algorithm from stable baseline3
Self-driving car env with PPO algorithm from stable baseline3

Self-driving car with RL stable baseline3 Most of the project develop from https://github.com/GerardMaggiolino/Gym-Medium-Post Please check it out! Th

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

Releases(v1.0)
🏅 Top 5% in 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지

AI_SPARK_CHALLENG_Object_Detection 제2회 연구개발특구 인공지능 경진대회 AI SPARK 챌린지 🏅 Top 5% in mAP(0.75) (443명 중 13등, mAP: 0.98116) 대회 설명 Edge 환경에서의 가축 Object Dete

3 Sep 19, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
基于PaddleOCR搭建的OCR server... 离线部署用

开头说明 DangoOCR 是基于大家的 CPU处理器 来运行的,CPU处理器 的好坏会直接影响其速度, 但不会影响识别的精度 ,目前此版本识别速度可能在 0.5-3秒之间,具体取决于大家机器的配置,可以的话尽量不要在运行时开其他太多东西。需要配合团子翻译器 Ver3.6 及其以上的版本才可以使用!

胖次团子 131 Dec 25, 2022
Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

Wang Yijun 109 Nov 29, 2022
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 05, 2022
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022
Spatial color quantization in Rust

rscolorq Rust port of Derrick Coetzee's scolorq, based on the 1998 paper "On spatial quantization of color images" by Jan Puzicha, Markus Held, Jens K

Collyn O'Kane 37 Dec 22, 2022
Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

SoohyunPark 1 Feb 07, 2022
Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations" this repository is maintained by bo

Yuhan Liu 24 Nov 29, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Jang Hyun Cho1, Utkarsh Mall2, Kavita Bala2, Bharath Harihar

Jang Hyun Cho 164 Dec 30, 2022
AI4Good project for detecting waste in the environment

Detect waste AI4Good project for detecting waste in environment. www.detectwaste.ml. Our latest results were published in Waste Management journal in

108 Dec 25, 2022
SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

Picsart AI Research (PAIR) 186 Dec 30, 2022
A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Cluster di HPC con GPU per esperimenti di calcolo (draft version 1.0) Per poter utilizzare il cluster il primo passo è abilitare l'account istituziona

20 Dec 16, 2022
Deep learning based hand gesture recognition using LSTM and MediaPipie.

Hand Gesture Recognition Deep learning based hand gesture recognition using LSTM and MediaPipie. Demo video using PingPong Robot Files Pretrained mode

Brad 24 Nov 11, 2022
A Learning-based Camera Calibration Toolbox

Learning-based Camera Calibration A Learning-based Camera Calibration Toolbox Paper The pdf file can be found here. @misc{zhang2022learningbased,

Eason 14 Dec 21, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Efficient training of deep recommenders on cloud.

HybridBackend Introduction HybridBackend is a training framework for deep recommenders which bridges the gap between evolving cloud infrastructure and

Alibaba 111 Dec 23, 2022
Single Image Deraining Using Bilateral Recurrent Network (TIP 2020)

Single Image Deraining Using Bilateral Recurrent Network Introduction Single image deraining has received considerable progress based on deep convolut

23 Aug 10, 2022