[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Overview

Anycost GAN

video | paper | website

Anycost GANs for Interactive Image Synthesis and Editing

Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu

MIT, Adobe Research, CMU

In CVPR 2021

flexible

Anycost GAN generates consistent outputs under various computational budgets.

Demo

Here, we can use the Anycost generator for interactive image editing. A full generator takes ~3s to render an image, which is too slow for editing. While with Anycost generator, we can provide a visually similar preview at 5x faster speed. After adjustment, we hit the "Finalize" button to synthesize the high-quality final output. Check here for the full demo.

Overview

Anycost generators can be run at diverse computation costs by using different channel and resolution configurations. Sub-generators achieve high output consistency compared to the full generator, providing a fast preview.

overview

With (1) Sampling-based multi-resolution training, (2) adaptive-channel training, and (3) generator-conditioned discriminator, we achieve high image quality and consistency at different resolutions and channels.

method

Results

Anycost GAN (uniform channel version) supports 4 resolutions and 4 channel ratios, producing visually consistent images with different image fidelity.

uniform

The consistency retains during image projection and editing:

Usage

Getting Started

  • Clone this repo:
git clone https://github.com/mit-han-lab/anycost-gan.git
cd anycost-gan
  • Install PyTorch 1.7 and other dependeinces.

We recommend setting up the environment using Anaconda: conda env create -f environment.yml

Introduction Notebook

We provide a jupyter notebook example to show how to use the anycost generator for image synthesis at diverse costs: notebooks/intro.ipynb.

We also provide a colab version of the notebook: . Be sure to select the GPU as the accelerator in runtime options.

Interactive Demo

We provide an interactive demo showing how we can use anycost GAN to enable interactive image editing. To run the demo:

python demo.py

You can find a video recording of the demo here.

Using Pre-trained Models

To get the pre-trained generator, encoder, and editing directions, run:

import model

pretrained_type = 'generator'  # choosing from ['generator', 'encoder', 'boundary']
config_name = 'anycost-ffhq-config-f'  # replace the config name for other models
model.get_pretrained(pretrained_type, config=config_name)

We also provide the face attribute classifier (which is general for different generators) for computing the editing directions. You can get it by running:

model.get_pretrained('attribute-predictor')

The attribute classifier takes in the face images in FFHQ format.

After loading the Anycost generator, we can run it at a wide range of computational costs. For example:

from model.dynamic_channel import set_uniform_channel_ratio, reset_generator

g = model.get_pretrained('generator', config='anycost-ffhq-config-f')  # anycost uniform
set_uniform_channel_ratio(g, 0.5)  # set channel
g.target_res = 512  # set resolution
out, _ = g(...)  # generate image
reset_generator(g)  # restore the generator

For detailed usage and flexible-channel anycost generator, please refer to notebooks/intro.ipynb.

Model Zoo

Currently, we provide the following pre-trained generators, encoders, and editing directions. We will add more in the future.

For Anycost generators, by default, we refer to the uniform setting.

config name generator encoder edit direction
anycost-ffhq-config-f ✔️ ✔️ ✔️
anycost-ffhq-config-f-flexible ✔️ ✔️ ✔️
anycost-car-config-f ✔️
stylegan2-ffhq-config-f ✔️ ✔️ ✔️

stylegan2-ffhq-config-f refers to the official StyleGAN2 generator converted from the repo.

Datasets

We prepare the FFHQ, CelebA-HQ, and LSUN Car datasets into a directory of images, so that it can be easily used with ImageFolder from torchvision. The dataset layout looks like:

├── PATH_TO_DATASET
│   ├── images
│   │   ├── 00000.png
│   │   ├── 00001.png
│   │   ├── ...

Due to the copyright issue, you need to download the dataset from official site and process them accordingly.

Evaluation

We provide the code to evaluate some metrics presented in the paper. Some of the code is written with horovod to support distributed evaluation and reduce the cost of inter-GPU communication, which greatly improves the speed. Check its website for a proper installation.

Fre ́chet Inception Distance (FID)

Before evaluating the FIDs, you need to compute the inception features of the real images using scripts like:

python tools/calc_inception.py \
    --resolution 1024 --batch_size 64 -j 16 --n_sample 50000 \
    --save_name assets/inceptions/inception_ffhq_res1024_50k.pkl \
    PATH_TO_FFHQ

or you can download the pre-computed inceptions from here and put it under assets/inceptions.

Then, you can evaluate the FIDs by running:

horovodrun -np N_GPU \
    python metrics/fid.py \
    --config anycost-ffhq-config-f \
    --batch_size 16 --n_sample 50000 \
    --inception assets/inceptions/inception_ffhq_res1024_50k.pkl
    # --channel_ratio 0.5 --target_res 512  # optionally using a smaller resolution/channel

Perceptual Path Lenght (PPL)

Similary, evaluting the PPL with:

horovodrun -np N_GPU \
    python metrics/ppl.py \
    --config anycost-ffhq-config-f

Attribute Consistency

Evaluating the attribute consistency by running:

horovodrun -np N_GPU \
    python metrics/attribute_consistency.py \
    --config anycost-ffhq-config-f \
    --channel_ratio 0.5 --target_res 512  # config for the sub-generator; necessary

Encoder Evaluation

To evaluate the performance of the encoder, run:

python metrics/eval_encoder.py \
    --config anycost-ffhq-config-f \
    --data_path PATH_TO_CELEBA_HQ

Training

The training code will be updated shortly.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{lin2021anycost,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Anycost GANs for Interactive Image Synthesis and Editing},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2021},
}

Related Projects

GAN Compression | Once for All | iGAN | StyleGAN2

Acknowledgement

We thank Taesung Park, Zhixin Shu, Muyang Li, and Han Cai for the helpful discussion. Part of the work is supported by NSF CAREER Award #1943349, Adobe, Naver Corporation, and MIT-IBM Watson AI Lab.

The codebase is build upon a PyTorch implementation of StyleGAN2: rosinality/stylegan2-pytorch. For editing direction extraction, we refer to InterFaceGAN.

Owner
MIT HAN Lab
Accelerating Deep Learning Computing
MIT HAN Lab
Autonomous Driving on Curvy Roads without Reliance on Frenet Frame: A Cartesian-based Trajectory Planning Method

C++/ROS Source Codes for "Autonomous Driving on Curvy Roads without Reliance on Frenet Frame: A Cartesian-based Trajectory Planning Method" published in IEEE Trans. Intelligent Transportation Systems

Bai Li 88 Dec 23, 2022
Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

Wenqi Zhao 87 Dec 27, 2022
PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

HoroPCA This code is the official PyTorch implementation of the ICML 2021 paper: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projec

HazyResearch 52 Nov 14, 2022
Implementation for the "Surface Reconstruction from 3D Line Segments" paper.

Surface Reconstruction from 3D Line Segments Surface reconstruction from 3d line segments. Langlois, P. A., Boulch, A., & Marlet, R. In 2019 Internati

85 Jan 04, 2023
《Geo Word Clouds》paper implementation

《Geo Word Clouds》paper implementation

Russellwzr 2 Jan 28, 2022
Graph Neural Networks with Keras and Tensorflow 2.

Welcome to Spektral Spektral is a Python library for graph deep learning, based on the Keras API and TensorFlow 2. The main goal of this project is to

Daniele Grattarola 2.2k Jan 08, 2023
a project for 3D multi-object tracking

a project for 3D multi-object tracking

155 Jan 04, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

417 Dec 20, 2022
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

H-Transformer-1D Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. For now,

Phil Wang 123 Nov 17, 2022
PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition The unofficial code of CDistNet. Now, we ha

25 Jul 20, 2022
Automatic learning-rate scheduler

AutoLRS This is the PyTorch code implementation for the paper AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly published

Yuchen Jin 33 Nov 18, 2022
Code to reproduce the results in "Visually Grounded Reasoning across Languages and Cultures", EMNLP 2021.

marvl-code [WIP] This is the implementation of the approaches described in the paper: Fangyu Liu*, Emanuele Bugliarello*, Edoardo M. Ponti, Siva Reddy

25 Nov 15, 2022
A modern pure-Python library for reading PDF files

pdf A modern pure-Python library for reading PDF files. The goal is to have a modern interface to handle PDF files which is consistent with itself and

6 Apr 06, 2022
Intelligent Video Analytics toolkit based on different inference backends.

English | 中文 OpenIVA OpenIVA is an end-to-end intelligent video analytics development toolkit based on different inference backends, designed to help

Quantum Liu 15 Oct 27, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling For Official repo of NU-Wave: A Diffusion Probabilistic Model for Neural Audio Up

Rishikesh (ऋषिकेश) 38 Oct 11, 2022
A bare-bones Python library for quality diversity optimization.

pyribs Website Source PyPI Conda CI/CD Docs Docs Status Twitter pyribs.org GitHub docs.pyribs.org A bare-bones Python library for quality diversity op

ICAROS 127 Jan 06, 2023
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our rep

7.7k Jan 06, 2023
Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

Omninet - Pytorch Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch. The authors propose that we should be atte

Phil Wang 48 Nov 21, 2022
Reinforcement Learning via Supervised Learning

Reinforcement Learning via Supervised Learning Installation Run pip install -e . in an environment with Python = 3.7.0, 3.9. The code depends on MuJ

Scott Emmons 49 Nov 28, 2022
Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

Hongxin Wei 12 Dec 07, 2022