SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

Overview

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020 Oral)

Python 3.7 pytorch 1.2.0 pyqt5 5.13.0

image Figure: Face image editing controlled via style images and segmentation masks with SEAN

We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka
Computer Vision and Pattern Recognition CVPR 2020, Oral

[Paper] [Project Page] [Demo]

Installation

Clone this repo.

git clone https://github.com/ZPdesu/SEAN.git
cd SEAN/

This code requires PyTorch, python 3+ and Pyqt5. Please install dependencies by

pip install -r requirements.txt

This model requires a lot of memory and time to train. To speed up the training, we recommend using 4 V100 GPUs

Dataset Preparation

This code uses CelebA-HQ and CelebAMask-HQ dataset. The prepared dataset can be directly downloaded here. After unzipping, put the entire CelebA-HQ folder in the datasets folder. The complete directory should look like ./datasets/CelebA-HQ/train/ and ./datasets/CelebA-HQ/test/.

Generating Images Using Pretrained Models

Once the dataset is prepared, the reconstruction results be got using pretrained models.

  1. Create ./checkpoints/ in the main folder and download the tar of the pretrained models from the Google Drive Folder. Save the tar in ./checkpoints/, then run

    cd checkpoints
    tar CelebA-HQ_pretrained.tar.gz
    cd ../
    
  2. Generate the reconstruction results using the pretrained model.

    python test.py --name CelebA-HQ_pretrained --load_size 256 --crop_size 256 --dataset_mode custom --label_dir datasets/CelebA-HQ/test/labels --image_dir datasets/CelebA-HQ/test/images --label_nc 19 --no_instance --gpu_ids 0
  3. The reconstruction images are saved at ./results/CelebA-HQ_pretrained/ and the corresponding style codes are stored at ./styles_test/style_codes/.

  4. Pre-calculate the mean style codes for the UI mode. The mean style codes can be found at ./styles_test/mean_style_code/.

    python calculate_mean_style_code.py

Training New Models

To train the new model, you need to specify the option --dataset_mode custom, along with --label_dir [path_to_labels] --image_dir [path_to_images]. You also need to specify options such as --label_nc for the number of label classes in the dataset, and --no_instance to denote the dataset doesn't have instance maps.

python train.py --name [experiment_name] --load_size 256 --crop_size 256 --dataset_mode custom --label_dir datasets/CelebA-HQ/train/labels --image_dir datasets/CelebA-HQ/train/images --label_nc 19 --no_instance --batchSize 32 --gpu_ids 0,1,2,3

If you only have single GPU with small memory, please use --batchSize 2 --gpu_ids 0.

UI Introduction

We provide a convenient UI for the users to do some extension works. To run the UI mode, you need to:

  1. run the step Generating Images Using Pretrained Models to save the style codes of the test images and the mean style codes. Or you can directly download the style codes from here. (Note: if you directly use the downloaded style codes, you have to use the pretrained model.

  2. Put the visualization images of the labels used for generating in ./imgs/colormaps/ and the style images in ./imgs/style_imgs_test/. Some example images are provided in these 2 folders. Note: the visualization image and the style image should be picked from ./datasets/CelebAMask-HQ/test/vis/ and ./datasets/CelebAMask-HQ/test/labels/, because only the style codes of the test images are saved in ./styles_test/style_codes/. If you want to use your own images, please prepare the images, labels and visualization of the labels in ./datasets/CelebAMask-HQ/test/ with the same format, and calculate the corresponding style codes.

  3. Run the UI mode

    python run_UI.py --name CelebA-HQ_pretrained --load_size 256 --crop_size 256 --dataset_mode custom --label_dir datasets/CelebA-HQ/test/labels --image_dir datasets/CelebA-HQ/test/images --label_nc 19 --no_instance --gpu_ids 0
  4. How to use the UI. Please check the detail usage of the UI from our Video.

    image

Other Datasets

Will be released soon.

License

All rights reserved. Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International) The code is released for academic research use only.

Citation

If you use this code for your research, please cite our papers.

@InProceedings{Zhu_2020_CVPR,
author = {Zhu, Peihao and Abdal, Rameen and Qin, Yipeng and Wonka, Peter},
title = {SEAN: Image Synthesis With Semantic Region-Adaptive Normalization},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgments

We thank Wamiq Reyaz Para for helpful comments. This code borrows heavily from SPADE. We thank Taesung Park for sharing his codes. This work was supported by the KAUST Office of Sponsored Research (OSR) under AwardNo. OSR-CRG2018-3730.

Owner
Peihao Zhu
CS PhD at KAUST
Peihao Zhu
Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Myo Keylogging This is the source code for our paper My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack by Matthias Ga

Secure Mobile Networking Lab 7 Jan 03, 2023
Adversarial Graph Augmentation to Improve Graph Contrastive Learning

ADGCL : Adversarial Graph Augmentation to Improve Graph Contrastive Learning Introduction This repo contains the Pytorch [1] implementation of Adversa

susheel suresh 62 Nov 19, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

Guillaume Chevalier 287 Dec 27, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 08, 2023
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

A three-stage detection and recognition pipeline of complex meters in wild This is the first released system towards detection and recognition of comp

Yan Shu 19 Nov 28, 2022
GRF: Learning a General Radiance Field for 3D Representation and Rendering

GRF: Learning a General Radiance Field for 3D Representation and Rendering [Paper] [Video] GRF: Learning a General Radiance Field for 3D Representatio

Alex Trevithick 243 Dec 29, 2022
pq is a jq-like Pickle file viewer

pq PQ is a jq-like viewer/processing tool for pickle files. howto # pq '' file.pkl {'other': 456, 'test': 123} # pq 'table' file.pkl |other|test| | 45

3 Mar 15, 2022
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

DMT: Dynamic Mutual Training for Semi-Supervised Learning This repository contains the code for our paper DMT: Dynamic Mutual Training for Semi-Superv

Zhengyang Feng 120 Dec 30, 2022
Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

Peter Sharpe 394 Dec 23, 2022
The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

SHGNN: Structure-Aware Heterogeneous Graph Neural Network The source code and dataset of the paper: SHGNN: Structure-Aware Heterogeneous Graph Neural

Wentao Xu 7 Nov 13, 2022
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Nerdy Rodent 2.3k Jan 04, 2023
Composing methods for ML training efficiency

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training.

MosaicML 2.8k Jan 08, 2023
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 08, 2023
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

Daniel Zügner 131 Dec 13, 2022
Python binding for Khiva library.

Khiva-Python Build Documentation Build Linux and Mac OS Build Windows Code Coverage README This is the Khiva Python binding, it allows the usage of Kh

Shapelets 46 Oct 16, 2022
Reinforcement Learning for finance

Reinforcement Learning for Finance We apply reinforcement learning for stock trading. Fetch Data Example import utils # fetch symbols from yahoo fina

Tomoaki Fujii 159 Jan 03, 2023
Plenoxels: Radiance Fields without Neural Networks

Plenoxels: Radiance Fields without Neural Networks Alex Yu*, Sara Fridovich-Keil*, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa UC Be

Sara Fridovich-Keil 81 Dec 25, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022