GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

Overview

Keras-GAN-Animeface-Character

GAN example for Keras. Cuz MNIST is too small and there should an example on something more realistic.

Some results

Training for 22 epochs

Youtube Video, click on the image

Training for 22 epochs

Loss graph for 5000 mini-batches

Loss graph

1 mini-batch = 64 images. Dataset = 14490, hence 5000 mini-batches is approximately 22 epochs.

Some outputs of 5000th min-batch

Some ouptputs of 5000th mini-batch

Some training images

Some inputs

Useful resources, before you go on

How to run this example

Setup

  • My environment: Python 3.6 + Keras 2.0.4 + Tensorflow 1.x
    • If you are on Keras 2.0.0, you need to update it otherwise BatchNormalization() will cause bug, saying "you need to pass float to input" or something like that from Tensorflow back end.
  • Use virtualenv to initialize a similar environment (python and dependencies):
pip install virtualenv
virtualenv -p <PATH_TO_BIN_DIR>/python3.6 venv
source venv/bin/activate
pip install -r requirements.txt
  • I HATE making a program that has so many command line parameters to pass. Many of the parameters are there in the scripts. Adjust the script as you need. The "main()" function is at the bottom of the script as people do in C/C++
  • Most global parameters are defined in args.py.
    • They are defined as class variables not instance variables so you may have trouble running/training multiple instances of the GAN with different parameters. (which is very unlikely to happen)
  • Download dataset from http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/
    • Extract it to this directory so that the scipt can find ./animeface-character-dataset/thumb/
    • Any dataset should work in principle but GAN is sensitive to hyperparameters and may not work on yours. I tuned the parameters for animeface-character-dataset.

Preprocessing

  • Run the preprocessing script. It saves training time to resize/scale the input than doing those tasks on the fly in the training loop.
    • ./data.py
    • The image, when loaded from PNG files, the RGB values have [0, 255]. (uint8 type). data.py will collect the images, resize the images to 64x64 and scale the RGB values so that they will be in [-1.0, 1.0] range.
    • Data.py will only sample a subset of the dataset if configured to do so. The size of the subset is determined by dataset_sz defined in args.py
    • The images will be written to data.hdf5.
      • Made it small to verify the training is working.
      • You can increase it but you need to adjust the network sizes accordingly.
    • Again, which files to read is defined in the script at the bottom, not by sys.argv.
  • You need a large enough dataset. Otherwise the discriminator will sort of "memorize" the true data and reject all that's generated.

Training

  • Open gan.py then at the bottom, uncomment train_autoenc() if you wish.
    • This is useful for seeing the generator network's capability to reproduce the input.
    • The auto-encoder will be trained on input images.
    • The output will be blurry, as the auto-encoder having mean-squared-error loss. (This is why GAN got invented in the first place!)
  • To run training, modify main() so that train_gan() is uncommented.
  • The script will dump reals.png and fakes.png every 10 epoch so that you can see how the training is going.
  • The training takes a while. For this example on Anime Face dataset, it took about 10000 mini-batches to get good results.
    • If you see only uniform color or "modern art" until 2000 then the training is not working!
  • The script also dumps weights every 10 batches. Utilize them to save training time. Weights before diverging is preferred :) Uncomment load_weights() in train_gan().

Training tips

What I experienced during my training of GAN.

  • As described in GAN Hacks, discriminator should be ahead of the generator so that the generator can be "guided" by the discriminator.
  • If you look at loss graph at https://github.com/osh/KerasGAN, they had gen loss in range of 2 to 4. Their training worked well. The discriminator loss is low, arond 0.1.
  • You'll need trial and error to get the hyper-pameters right so that the training stays in the stable, balanced zone. That includes learning rate of D and G, momentums, etc.
  • The convergence is quite sensitive with LR, beware!
  • If things go well, the discriminator loss for detecting real/fake = dloss0/dloss1 should be less than or around 0.1, which means it is good at telling whether the input is real or fake.
  • If learning rate is too high, the discriminator will diverge and one of the loss will get high and will not fall. Training fails in this case.
  • If you make LR too small, it will only slow the learning and will not prevent other issues such as oscillation. It only needs to be lower than certain threshold that is data dependent.
  • If adjusting LR doesn't work, it could be lack of complexity in the discriminator layer. Add more layers, or some other parameters. It could be anything :( Good luck!
  • On the other hand, generator loss will be relatively higher than discriminator loss. In this script, it oscillates in range 0.1 to 4.
  • If you see any of the D loss staying > 15 (when batch size is 32) the training is screwed.
  • In case of G loss > 15, see if it escapes within 30 batches. If it stays there for too long, it isn't good, I think.
  • In case you're seeing high G loss, it could mean it can't keep up with discriminator. You might need to increase LR. (Must be slower than discriminator though)
  • One final piece of the training I was missing was the parameter in BatchNormalization. I found about it in this link: https://github.com/shekkizh/neuralnetworks.thought-experiments/blob/master/Generative%20Models/GAN/Readme.md
    • Sort of interesting, in PyTorch, momentum parameter for BatchNorm is 0.1, according to the API documents, while in Keras it is 0.99. I'm not sure if 0.1 in PyTorch actually means 1 - 0.1. I didn't look into PyTorch backend implementation.
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition | paper | dataset | pretrained detection model | Authors: Yi-Chang Che

Yi-Chang Chen 1 Aug 23, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models This repo contains code for DDPM training. Based on Denoising Diffusion Probabilistic Models, Improved Denois

Alexander Markov 7 Dec 15, 2022
Transfer Learning for Pose Estimation of Illustrated Characters

bizarre-pose-estimator Transfer Learning for Pose Estimation of Illustrated Characters Shuhong Chen *, Matthias Zwicker * WACV2022 [arxiv] [video] [po

Shuhong Chen 142 Dec 28, 2022
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations This repo contains official code for the NeurIPS 2021 paper Imi

Jiayao Zhang 2 Oct 18, 2021
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

Couler Project 781 Jan 03, 2023
ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder: Scalable and Versatile Classification Head Paper Official PyTorch Implementation Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baru

189 Jan 04, 2023
Machine learning framework for both deep learning and traditional algorithms

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for

NeoML 704 Dec 27, 2022
Viperdb - A tiny log-structured key-value database written in pure Python

ViperDB 🐍 ViperDB is a lightweight embedded key-value store written in pure Pyt

17 Oct 17, 2022
UV matrix decompostion using movielens dataset

UV-matrix-decompostion-with-kfold UV matrix decompostion using movielens dataset upload the 'ratings.dat' file install the following python libraries

2 Oct 18, 2022
Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022
Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

auto-self-checker μžλ™μœΌλ‘œ μžκ°€μ§„λ‹¨ ν•΄μ£ΌλŠ” ν”„λ‘œκ·Έλž¨(python ν•„μš”) μ€‘μš” 이 ν”„λ‘œκ·Έλž¨μ΄ μ‹€ν–‰λ λ•Œμ—λŠ” μ ˆλŒ€λ‘œ λ§ˆμš°μŠ€ν¬μΈν„°λ₯Ό μ›€μ§μ΄κ±°λ‚˜ ν‚€λ³΄λ“œλ₯Ό κ±΄λ“œλ¦¬λ©΄ μ•ˆλœλ‹€(화면인식, λ§ˆμš°μŠ€ν¬μΈν„°λ‘œ 직접 클릭) μ‚¬μš©λ²• ν”„λ‘œκ·Έλž¨μ„ ꡬ동할 폴더 λ‚΄μ˜ cmdμ°½μ—μ„œ pip

1 Dec 30, 2021
HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow

Class HiddenMarkovModel HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow 2.0 Installatio

Susara Thenuwara 2 Nov 03, 2021
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023
DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation This repository contains PyTorch evaluation code, trainin

Meta Research 45 Dec 20, 2022
[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

AugMax: Adversarial Composition of Random Augmentations for Robust Training Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Anima Anandkumar, an

VITA 112 Nov 07, 2022