Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Overview

OverLORD - Official PyTorch Implementation

Scaling-up Disentanglement for Image Translation
Aviv Gabbay and Yedid Hoshen
International Conference on Computer Vision (ICCV), 2021.

Abstract: Image translation methods typically aim to manipulate a set of labeled attributes (given as supervision at training time e.g. domain label) while leaving the unlabeled attributes intact. Current methods achieve either: (i) disentanglement, which exhibits low visual fidelity and can only be satisfied where the attributes are perfectly uncorrelated. (ii) visually-plausible translations, which are clearly not disentangled. In this work, we propose OverLORD, a single framework for disentangling labeled and unlabeled attributes as well as synthesizing high-fidelity images, which is composed of two stages; (i) Disentanglement: Learning disentangled representations with latent optimization. Differently from previous approaches, we do not rely on adversarial training or any architectural biases. (ii) Synthesis: Training feed-forward encoders for inferring the learned attributes and tuning the generator in an adversarial manner to increase the perceptual quality. When the labeled and unlabeled attributes are correlated, we model an additional representation that accounts for the correlated attributes and improves disentanglement. We highlight that our flexible framework covers multiple settings as disentangling labeled attributes, pose and appearance, localized concepts, and shape and texture. We present significantly better disentanglement with higher translation quality and greater output diversity than state-of-the-art methods.

Description

A framework for high-fidelity disentanglement of labeled and unlabeled attributes. We support two general cases: (i) The labeled and unlabeled attributes are approximately uncorrelated. (ii) The labeled and unlabeled attributes are correlated. For this case, we suggest simple forms of transformations for learning pose-independent or localized correlated attributes, by which we achieve better disentanglement both quantitatively and qualitatively than state-of-the-art methods.

Case 1: Uncorrelated Labeled and Unlabeled Attributes

  • Facial age editing: Disentanglement of labeled age and uncorrelated unlabeled attributes (FFHQ).
Input [0-9] [10-19] [50-59] [70-79]
  • Disentanglement of labeled identity and uncorrelated unlabeled attributes (CelebA).
Identity Attributes #1 Translation #1 Attributes #2 Translation #2
  • Disentanglement of labeled shape (edge map) and unlabeled texture (Edges2Shoes).
Texture Shape #1 Translation #1 Shape #2 Translation #2

Case 2: Correlated Labeled and Unlabeled Attributes

  • Disentanglement of domain label (cat, dog or wild), correlated appearance and uncorrelated pose. FUNIT and StarGAN-v2 rely on architectural biases that tightly preserve the spatial structure, leading to unreliable facial shapes which are unique to the source domain. We disentangle the pose and capture the appearance of the target breed faithfully.
Pose Appearance FUNIT StarGAN-v2 Ours
  • Male-to-Female translation in two settings: (i) When the gender is assumed to be approximately uncorrelated with all the unlabeled attributes. (ii) When we model the hairstyle as localized correlation and utilize a reference image specifying its target.
Input Ours [uncorrelated] Reference StarGAN-v2 Ours [correlated]

Requirements

python 3.7 pytorch 1.3 cuda 10.1

This repository imports modules from the StyleGAN2 architecture (not pretrained). Clone the following repository:

git clone https://github.com/rosinality/stylegan2-pytorch

Add the local StyleGAN2 project to PYTHONPATH. For bash users:

export PYTHONPATH=
   

   

Training

In order to train a model from scratch, do the following preprocessing and training steps. First, create a directory (can be specified by --base-dir or set to current working directory by default) for the training artifacts (preprocessed data, models, training logs, etc).

Facial Age Editing (FFHQ):

Download the FFHQ dataset and annotations. Create a directory named ffhq-dataset with all the png images placed in a single imgs subdir and all the json annotations placed in a features subdir.

python main.py preprocess --dataset-id ffhq --dataset-path ffhq-dataset --out-data-name ffhq-x256-age
python main.py train --config ffhq --data-name ffhq-x256-age --model-name overlord-ffhq-x256-age

Facial Identity Disentanglement (CelebA)

Download the aligned and cropped images from the CelebA dataset to a new directory named celeba-dataset.

python main.py preprocess --dataset-id celeba --dataset-path celeba-dataset --out-data-name celeba-x128-identity
python main.py train --config celeba --data-name celeba-x128-identity --model-name overlord-celeba-x128-identity

Pose and Appearance Disentanglement (AFHQ)

Download the AFHQ dataset to a new directory named afhq-dataset.

python main.py preprocess --dataset-id afhq --dataset-path afhq-dataset --split train --out-data-name afhq-x256
python main.py train --config afhq --data-name afhq-x256 --model-name overlord-afhq-x256

Male-to-Female Translation (CelebA-HQ)

Download the CelebA-HQ dataset and create a directory named celebahq-dataset with all the images placed in a single imgs subdir. Download CelebAMask-HQ from MaskGAN and extract as another subdir under the dataset root directory.

python main.py preprocess --dataset-id celebahq --dataset-path celebahq-dataset --out-data-name celebahq-x256-gender

Training a model for the uncorrelated case:

python main.py train --config celebahq --data-name celebahq-x256-gender --model-name overlord-celebahq-x256-gender

Training a model with modeling hairstyle as localized correlation:

python main.py train --config celebahq_hair --data-name celebahq-x256-gender --model-name overlord-celebahq-x256-gender-hair

Resources

The training automatically detects all the available gpus and applies multi-gpu mode if available.

Logs

During training, loss metrics and translation visualizations are logged with tensorboard and can be viewed by:

tensorboard --logdir 
   
    /cache/tensorboard --load_fast true

   

Pretrained Models

We provide several pretrained models for the main experiments presented in the paper. Please download the entire directory of each model and place it under /cache/models .

Model Description
overlord-ffhq-x256-age OverLORD trained on FFHQ for facial age editing.
overlord-celeba-x128-identity OverLORD trained on CelebA for facial identity disentanglement.
overlord-afhq-x256 OverLORD trained on AFHQ for pose and appearance disentanglement.
overlord-celebahq-x256-gender OverLORD trained on CelebA-HQ for male-to-female translation.
overlord-celebahq-x256-gender-hair OverLORD trained on CelebA-HQ for male-to-female translation with hairstyle as localized correlation.

Inference

Given a trained model (either pretrained or trained from scratch), a test image can be manipulated as follows:

python main.py manipulate --model-name overlord-ffhq-x256-age --img face.png --output face_in_all_ages.png
python main.py manipulate --model-name overlord-celeba-x128-identity --img attributes.png --reference identity.png --output translation.png
python main.py manipulate --model-name overlord-afhq-x256 --img pose.png --reference appearance.png --output translation.png 
python main.py manipulate --model-name overlord-celebahq-x256-gender --img face.png --output face_in_all_genders.png
python main.py manipulate --model-name overlord-celebahq-x256-gender-hair --img face.png --reference hairstyle.png --output translation.png

Note: Face manipulation models are very sensitive to the face alignment. The target face should be aligned exactly as done in the pipeline which CelebA-HQ and FFHQ were created by. Use the alignment method implemented here before applying any of the human face manipulation models on external images.

Citation

@inproceedings{gabbay2021overlord,
  author    = {Aviv Gabbay and Yedid Hoshen},
  title     = {Scaling-up Disentanglement for Image Translation},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year      = {2021}
}
Owner
Aviv Gabbay
PhD student at Hebrew University of Jerusalem. Computer Vision, Speech Processing and Deep Learning Researcher
Aviv Gabbay
根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

Genshin-lyre-auto-play 简体中文 | English 简介 根据midi文件演奏“风物之诗琴”的脚本。由Python驱动,在此承诺, ⚠️ 项目内绝不含任何能够引起安全问题的代码。 前排提示:所有键盘在动但是原神没反应的都是因为没有管理员权限,双击run.bat或者以管理员模式

御坂17032号 386 Jan 01, 2023
Inferring Lexicographically-Ordered Rewards from Preferences

Inferring Lexicographically-Ordered Rewards from Preferences Code author: Alihan Hüyük ([e

Alihan Hüyük 1 Feb 13, 2022
I created My own Virtual Artificial Intelligence named genesis, He can assist with my Tasks and also perform some analysis,,

Virtual-Artificial-Intelligence-genesis- I created My own Virtual Artificial Intelligence named genesis, He can assist with my Tasks and also perform

AKASH M 1 Nov 05, 2021
Object Tracking and Detection Using OpenCV

Object tracking is one such application of computer vision where an object is detected in a video, otherwise interpreted as a set of frames, and the object’s trajectory is estimated. For instance, yo

Happy N. Monday 4 Aug 21, 2022
Spatial color quantization in Rust

rscolorq Rust port of Derrick Coetzee's scolorq, based on the 1998 paper "On spatial quantization of color images" by Jan Puzicha, Markus Held, Jens K

Collyn O'Kane 37 Dec 22, 2022
Object detection (YOLO) with pytorch, OpenCV and python

Real Time Object/Face Detection Using YOLO-v3 This project implements a real time object and face detection using YOLO algorithm. You only look once,

1 Aug 04, 2022
Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Python Assignments for the Deep Learning lectures by Andrew NG on coursera with complete submission for grading capability.

Utkarsh Agiwal 1 Feb 03, 2022
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Noether Networks: meta-learning useful conserved quantities

Noether Networks: meta-learning useful conserved quantities This repository contains the code necessary to reproduce experiments from "Noether Network

Dylan Doblar 33 Nov 23, 2022
Human Pose Detection on EdgeTPU

Coral PoseNet Pose estimation refers to computer vision techniques that detect human figures in images and video, so that one could determine, for exa

google-coral 476 Dec 31, 2022
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

DynaBOA Code repositoty for the paper: Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation Shanyan Guan, Jingwei Xu, Michell

197 Jan 07, 2023
[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

EMLight: Lighting Estimation via Spherical Distribution Approximation (AAAI 2021) Update 12/2021: We release our Virtual Object Relighting (VOR) Datas

Fangneng Zhan 144 Jan 06, 2023
Official code for the ICCV 2021 paper "DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders"

DECA Official code for the ICCV 2021 paper "DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders". All the code is writte

23 Dec 01, 2022
CaLiGraph Ontology as a Challenge for Semantic Reasoners ([email protected]'21)

CaLiGraph for Semantic Reasoning Evaluation Challenge This repository contains code and data to use CaLiGraph as a benchmark dataset in the Semantic R

Nico Heist 0 Jun 08, 2022
AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

Audio Quality Platform - AQP An Open Modular Python Platform for Objective Speech and Audio Quality Metrics AQP is a highly modular pipeline designed

Jack Geraghty 24 Oct 01, 2022
利用python脚本实现微信、支付宝账单的合并,并保存到excel文件实现自动记账,可查看可视化图表。

KeepAccounts_v2.0 KeepAccounts.exe和其配套表格能够实现微信、支付宝官方导出账单的读取合并,为每笔帐标记类型,并按月份和类型生成可视化图表。再也不用消费一笔记一笔,每月仅需10分钟,记好所有的帐。 作者: MickLife Bilibili: https://spac

159 Jan 01, 2023
A modern pure-Python library for reading PDF files

pdf A modern pure-Python library for reading PDF files. The goal is to have a modern interface to handle PDF files which is consistent with itself and

6 Apr 06, 2022
A PaddlePaddle version image model zoo.

Paddle-Image-Models English | 简体中文 A PaddlePaddle version image model zoo. Install Package Install by pip: $ pip install ppim Install by wheel package

AgentMaker 131 Dec 07, 2022