Learning Chinese Character style with conditional GAN

Last update: Jan 02, 2023

Overview

zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks

Introduction

Learning eastern asian language typefaces with GAN. zi2zi(字到字, meaning from character to character) is an application and extension of the recent popular pix2pix model to Chinese characters.

Details could be found in this blog post.

Network Structure

Original Model

The network structure is based off pix2pix with the addition of category embedding and two other losses, category loss and constant loss, from AC-GAN and DTN respectively.

Updated Model with Label Shuffling

After sufficient training, d_loss will drop to near zero, and the model's performance plateaued. Label Shuffling mitigate this problem by presenting new challenges to the model.

Specifically, within a given minibatch, for the same set of source characters, we generate two sets of target characters: one with correct embedding labels, the other with the shuffled labels. The shuffled set likely will not have the corresponding target images to compute L1_Loss, but can be used as a good source for all other losses, forcing the model to further generalize beyond the limited set of provided examples. Empirically, label shuffling improves the model's generalization on unseen data with better details, and decrease the required number of characters.

You can enable label shuffling by setting flip_labels=1 option in train.py script. It is recommended that you enable this after d_loss flatlines around zero, for further tuning.

Gallery

Compare with Ground Truth

Brush Writing Fonts

Cursive Script (Requested by SNS audience)

Mingchao Style (宋体/明朝体)

Korean

Interpolation

Animation

How to Use

Step Zero

Download tons of fonts as you please

Requirement

Python 2.7
CUDA
cudnn
Tensorflow >= 1.0.1
Pillow(PIL)
numpy >= 1.12.1
scipy >= 0.18.1
imageio

Preprocess

To avoid IO bottleneck, preprocessing is necessary to pickle your data into binary and persist in memory during training.

First run the below command to get the font images:

python font2img.py --src_font=src.ttf
                   --dst_font=tgt.otf
                   --charset=CN 
                   --sample_count=1000
                   --sample_dir=dir
                   --label=0
                   --filter=1
                   --shuffle=1

Four default charsets are offered: CN, CN_T(traditional), JP, KR. You can also point it to a one line file, it will generate the images of the characters in it. Note, filter option is highly recommended, it will pre sample some characters and filter all the images that have the same hash, usually indicating that character is missing. label indicating index in the category embeddings that this font associated with, default to 0.

After obtaining all images, run package.py to pickle the images and their corresponding labels into binary format:

python package.py --dir=image_directories
                  --save_dir=binary_save_directory
                  --split_ratio=[0,1]

After running this, you will find two objects train.obj and val.obj under the save_dir for training and validation, respectively.

Experiment Layout

experiment/
└── data
    ├── train.obj
    └── val.obj

Create a experiment directory under the root of the project, and a data directory within it to place the two binaries. Assuming a directory layout enforce bettet data isolation, especially if you have multiple experiments running.

Train

To start training run the following command

python train.py --experiment_dir=experiment 
                --experiment_id=0
                --batch_size=16 
                --lr=0.001
                --epoch=40 
                --sample_steps=50 
                --schedule=20 
                --L1_penalty=100 
                --Lconst_penalty=15

schedule here means in between how many epochs, the learning rate will decay by half. The train command will create sample,logs,checkpoint directory under experiment_dir if non-existed, where you can check and manage the progress of your training.

Infer and Interpolate

After training is done, run the below command to infer test data:

python infer.py --model_dir=checkpoint_dir/ 
                --batch_size=16 
                --source_obj=binary_obj_path 
                --embedding_ids=label[s] of the font, separate by comma
                --save_dir=save_dir/

Also you can do interpolation with this command:

python infer.py --model_dir= checkpoint_dir/ 
                --batch_size=10
                --source_obj=obj_path 
                --embedding_ids=label[s] of the font, separate by comma
                --save_dir=frames/ 
                --output_gif=gif_path 
                --interpolate=1 
                --steps=10
                --uroboros=1

It will run through all the pairs of fonts specified in embedding_ids and interpolate the number of steps as specified.

Pretrained Model

Pretained model can be downloaded here which is trained with 27 fonts, only generator is saved to reduce the model size. You can use encoder in the this pretrained model to accelerate the training process.

Acknowledgements

Code derived and rehashed from:

License

Apache 2.0

Learning Chinese Character style with conditional GAN

Related tags

Overview

zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks

Introduction

Network Structure

Original Model

Updated Model with Label Shuffling

Gallery

Compare with Ground Truth

Brush Writing Fonts

Cursive Script (Requested by SNS audience)

Mingchao Style (宋体/明朝体)

Korean

Interpolation

Animation

How to Use

Step Zero

Requirement

Preprocess

Experiment Layout

Train

Infer and Interpolate

Pretrained Model

Acknowledgements

License

Owner

Yuchen Tian

Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

LaBERT - A length-controllable and non-autoregressive image captioning model.

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

A Python package for performing pore network modeling of porous media

Robustness via Cross-Domain Ensembles

Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21.

Neural network chess engine trained on Gary Kasparov's games.

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

code for generating data set ES-ImageNet with corresponding training code

Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

CRF-RNN for Semantic Image Segmentation - PyTorch version

Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning

smc.covid is an R package related to the paper A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: the COVID-19 case by Storvik et al

Testbed of AI Systems Quality Management

This repository provides an efficient PyTorch-based library for training deep models.

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems