Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Overview

Swapping Autoencoder for Deep Image Manipulation

Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang

UC Berkeley and Adobe Research

teaser

Project page | Paper | 3 Min Video

Overview

Swapping Autoencoder consists of autoencoding (top) and swapping (bottom) operation. Top: An encoder E embeds an input (Notre-Dame) into two codes. The structure code is a tensor with spatial dimensions; the texture code is a 2048-dimensional vector. Decoding with generator G should produce a realistic image (enforced by discriminator D matching the input (reconstruction loss). Bottom: Decoding with the texture code from a second image (Saint Basil's Cathedral) should look realistic (via D) and match the texture of the image, by training with a patch co-occurrence discriminator Dpatch that enforces the output and reference patches look indistinguishable.

Installation / Requirements

  • CUDA 10.1 or newer is required because it uses a custom CUDA kernel of StyleGAN2, ported by @rosinality
  • The author used PyTorch 1.7.1 on Python 3.6
  • Install dependencies with pip install dominate torchgeometry func-timeout tqdm matplotlib opencv_python lmdb numpy GPUtil Pillow scikit-learn visdom

Testing and Evaluation.

We provide the pretrained models and also several images that reproduce the figures of the paper. Please download and unzip them here (2.1GB). The scripts assume that the checkpoints are at ./checkpoints/, and the test images at ./testphotos/, but they can be changed by modifying --checkpoints_dir and --dataroot options.

Swapping and Interpolation of the mountain model using sample images

To run simple swapping and interpolation, specify the two input reference images, change input_structure_image and input_texture_image fields of experiments/mountain_pretrained_launcher.py, and run

python -m experiments mountain_pretrained test simple_swapping
python -m experiments mountain_pretrained test simple_interpolation

The provided script, opt.tag("simple_swapping") and opt.tag("simple_interpolation") in particular of experiments/mountain_pretrained_launcher.py, invokes a terminal command that looks similar to the following one.

python test.py --evaluation_metrics simple_swapping \
--preprocess scale_shortside --load_size 512 \
--name mountain_pretrained  \
--input_structure_image [path_to_sample_image] \
--input_texture_image [path_to_sample_image] \
--texture_mix_alpha 0.0 0.25 0.5 0.75 1.0

In other words, feel free to use this command if that feels more straightforward.

The output images are saved at ./results/mountain_pretrained/simpleswapping/.

Texture Swapping

Our Swapping Autoencoder learns to disentangle texture from structure for image editing tasks such as texture swapping. Each row shows the result of combining the structure code of the leftmost image with the texture code of the top image.

To reproduce this image (Figure 4) as well as Figures 9 and 12 of the paper, run the following command:

# Reads options from ./experiments/church_pretrained_launcher.py
python -m experiments church_pretrained test swapping_grid

# Reads options from ./experiments/bedroom_pretrained_launcher.py
python -m experiments bedroom_pretrained test swapping_grid

# Reads options from ./experiments/mountain_pretrained_launcher.py
python -m experiments mountain_pretrained test swapping_grid

# Reads options from ./experiments/ffhq512_pretrained_launcher.py
python -m experiments ffhq512_pretrained test swapping_grid

Make sure the dataroot and checkpoints_dir paths are correctly set in the respective ./experiments/xx_pretrained_launcher.py script.

Quantitative Evaluations

To perform quantitative evaluation such as FID in Table 1, Fig 5, and Table 2, we first need to prepare image pairs of input structure and texture references images.

The reference images are randomly selected from the val set of LSUN, FFHQ, and the Waterfalls dataset. The pairs of input structure and texture images should be located at input_structure/ and input_style/ directory, with the same file name. For example, input_structure/001.png and input_style/001.png will be loaded together for swapping.

Replace the path to the test images at dataroot="./testphotos/church/fig5_tab2/" field of the script experiments/church_pretrained_launcher.py, and run

python -m experiments church_pretrained run_test swapping_for_eval
python -m experiments ffhq1024_pretrained run_test swapping_for_eval

The results can be viewed at ./results (that can be changed using --result_dir option).

The FID is then computed between the swapped images and the original structure images, using https://github.com/mseitzer/pytorch-fid.

Model Training.

Datasets

  • LSUN Church and Bedroom datasets can be downloaded here. Once downloaded and unzipped, the directories should contain [category]_[train/val]_lmdb/.
  • FFHQ datasets can be downloaded using this link. This is the zip file of 70,000 images at 1024x1024 resolution. Unzip the files, and we will load the image files directly.
  • The Flickr Mountains dataset and the Flickr Waterfall dataset are not sharable due to license issues. But the images were scraped from Mountains Anywhere and Waterfalls Around the World, using the Python wrapper for the Flickr API. Please contact Taesung Park with title "Flickr Dataset for Swapping Autoencoder" for more details.

Training Scripts

The training configurations are specified using the scripts in experiments/*_launcher.py. Use the following commands to launch various trainings.

# Modify |dataroot| and |checkpoints_dir| at
# experiments/[church,bedroom,ffhq,mountain]_launcher.py
python -m experiments church train church_default
python -m experiments bedroom train bedroom_default
python -m experiments ffhq train ffhq512_default
python -m experiments ffhq train ffhq1024_default

# By default, the script uses GPUtil to look at available GPUs
# on the machine and sets appropriate GPU IDs. To specify specific set of GPUs,
# use the |--gpu| option. Be sure to also change |num_gpus| option in the corresponding script.
python -m experiments church train church_default --gpu 01234567

The training progress can be monitored using visdom at the port number specified by --display_port. The default is https://localhost:2004.

Additionally, a few swapping grids are generated using random samples of the training set. They are saved as webpages at [checkpoints_dir]/[expr_name]/snapshots/. The frequency of the grid generation is controlled using --evaluation_freq.

All configurable parameters are printed at the beginning of training. These configurations are spreaded throughout the codes in def modify_commandline_options of relevant classes, such as models/swapping_autoencoder_model.py, util/iter_counter.py, or models/networks/encoder.py. To change these configuration, simply modify the corresponding option in opt.specify of the training script.

The code for parsing and configurations are at experiments/__init__.py, experiments/__main__.py, experiments/tmux_launcher.py.

Continuing training.

The training continues by default from the last checkpoint, because the --continue_train option is set True by default. To start from scratch, remove the checkpoint, or specify continue_train=False in the training script (e.g. experiments/church_launcher.py).

Code Structure (Main Functions)

  • models/swapping_autoencoder_model.py: The core file that defines losses, produces visuals.
  • optimizers/swapping_autoencoder_optimizer.py: Defines the optimizers and alternating training of GAN.
  • models/networks/: contains the model architectures generator.py, discriminator.py, encoder.py, patch_discrimiantor.py, stylegan2_layers.py.
  • options/__init__.py: contains basic option flags. BUT many important flags are spread out over files, such as swapping_autoencoder_model.py or generator.py. When the program starts, these options are all parsed together. The best way to check the used option list is to run the training script, and look at the console output of the configured options.
  • util/iter_counter.py: contains iteration counting.

Change Log

  • 4/14/2021: The configuration to train the pretrained model on the Mountains dataset had not been set correctly, and was updated accordingly.

Bibtex

If you use this code for your research, please cite our paper:

@inproceedings{park2020swapping,
  title={Swapping Autoencoder for Deep Image Manipulation},
  author={Park, Taesung and Zhu, Jun-Yan and Wang, Oliver and Lu, Jingwan and Shechtman, Eli and Efros, Alexei A. and Zhang, Richard},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Acknowledgment

The StyleGAN2 layers heavily borrows (or rather, directly copies!) the PyTorch implementation of @rosinality. We thank Nicholas Kolkin for the helpful discussion on the automated content and style evaluation, Jeongo Seo and Yoseob Kim for advice on the user interface, and William T. Peebles, Tongzhou Wang, and Yu Sun for the discussion on disentanglement.

Owner
Ph.D. student @ UC Berkeley https://taesung.me
A python3 tool to take a 360 degree survey of the RF spectrum (hamlib + rotctld + RTL-SDR/HackRF)

RF Light House (rflh) A python script to use a rotor and a SDR device (RTL-SDR or HackRF One) to measure the RF level around and get a data set and be

Pavel Milanes (CO7WT) 11 Dec 13, 2022
NEG loss implemented in pytorch

Pytorch Negative Sampling Loss Negative Sampling Loss implemented in PyTorch. Usage neg_loss = NEG_loss(num_classes, embedding_size) optimizer =

Daniil Gavrilov 123 Sep 13, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

基于 bert4keras 的一个baseline 不作任何 数据trick 单模 线上 最高可到 0.7891 # 基础 版 train.py 0.7769 # transformer 各层 cls concat 明神的trick https://xv44586.git

孙永松 7 Dec 28, 2021
Improved Fitness Optimization Landscapes for Sequence Design

ReLSO Improved Fitness Optimization Landscapes for Sequence Design Description Citation How to run Training models Original data source Description In

Krishnaswamy Lab 44 Dec 20, 2022
PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning This repository is for EMSRDPN introduced in the foll

7 Feb 10, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Simple PyTorch hierarchical models.

A python package adding basic hierarchal networks in pytorch for classification tasks. It implements a simple hierarchal network structure based on feed-backward outputs.

Rajiv Sarvepalli 5 Mar 06, 2022
Mscp jamf - Build compliance in jamf

mscp_jamf Build compliance in Jamf. This will build the following xml pieces to

Bob Gendler 3 Jul 25, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Embodied Intelligence via Learning and Evolution

Embodied Intelligence via Learning and Evolution This is the code for the paper Embodied Intelligence via Learning and Evolution Agrim Gupta, Silvio S

Agrim Gupta 111 Dec 13, 2022
4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

LinpengPan 5 Nov 09, 2022
Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

PyMPDATA PyMPDATA is a high-performance Numba-accelerated Pythonic implementation of the MPDATA algorithm of Smolarkiewicz et al. used in geophysical

Atmospheric Cloud Simulation Group @ Jagiellonian University 15 Nov 23, 2022
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

12 Nov 22, 2022
Prompt Tuning with Rules

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
A Small and Easy approach to the BraTS2020 dataset (2D Segmentation)

BraTS2020 A Light & Scalable Solution to BraTS2020 | Medical Brain Tumor Segmentation (2D Segmentation) Developed the segmentation models for segregat

Gunjan Haldar 0 Jan 19, 2022
Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

Sai Kumar Dwivedi 83 Nov 27, 2022
Official git repo for the CHIRP project

CHIRP Project This is the official git repository for the CHIRP project. Pull requests are accepted here, but for the moment, the main repository is s

Dan Smith 77 Jan 08, 2023
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 06, 2022
Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras

Face Mask Detection Face Mask Detection System built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

Chandrika Deb 1.4k Jan 03, 2023
5 Jan 05, 2023