Linear image-to-image translation

Overview

Linear (Un)supervised Image-to-Image Translation

Teaser image Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Training time is about 1 minute.

This repository contains the official pytorch implementation of the following paper:

The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation
Eitan Richardson and Yair Weiss
https://arxiv.org/abs/2007.12568

Abstract: Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of nonlocal problems for which the state-of-the-art fails while linear methods succeed.

TODO:

  • Code for reproducing the linear image-to-image translation results
  • Code for applying the linear transformation as regularization for deep unsupervisd image-to-image (based on ALAE)
  • Support for user-provided dataset (e.g. image folders)
  • Automatic detection of available GPU resources

Requirements

  • Pytorch (tested with pytorch 1.5.0)
  • faiss (tested with faiss 1.6.3 with GPU support)
  • OpenCV (used only for generating some of the synthetic transformations)

System Requirements

Both the PCA and the nearest-neighbors search in ICP are performed on GPU (using pytorch and faiss). A cuda-enabled GPU with at least 11 GB of RAM is recommended. Since the entire data is loaded to RAM (not in mini-batches), a lot of (CPU) RAM is required as well ...

Code structure

  • run_im2im.py: The main python script for training and testing the linear transformation
  • pca-linear-map.py: The main algorithm. Performs PCA for the two domains, resolves polarity ambiguity and learnes an orthogonal or unconstrained linear transformation. In the unpaired case, ICP iterations are used to find the best correspondence.
  • pca.py: Fast PCA using pytorch and the skewness-based polarity synchronization.
  • utils.py: Misc utils
  • data.py: Loading the dataset and applying the synthetic transformations

Preparing the datasets

The repository does not contain code for loading the datasets, however, the tested datasets were loaded in their standard format. Please download (or link) the datasets under datasets/CelebA, datasets/FFHQ and datasets/edges2shoes.

Learning a linear transformation

usage: run_im2im.py [--dataset {celeba,ffhq,shoes}]
                    [--resolution RESOLUTION]
                    [--a_transform {identity,rot90,vflip,edges,Canny-edges,colorize,super-res,inpaint}]
                    [--pairing {paired,matching,nonmatching,few-matches}]
                    [--matching {nn,cyc-nn}]
                    [--transform_type {orthogonal,linear}] [--n_iters N_ITERS]
                    [--n_components N_COMPONENTS] [--n_train N_TRAIN]
                    [--n_test N_TEST]

Results are saved into the results folder.

Command example for generating the colorization result in the above image (figure 9 in tha paper):

python3 run_im2im.py --dataset ffhq --resolution 128 --a_transform colorize --n_components 2000 --n_train 20000 --n_test 25
Loading matching data for ffhq - colorize ...
100%|██████████████████████████████████████████████████████████████████████████| 20000/20000 [00:04<00:00, 4549.19it/s]
100%|█████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 299.33it/s]
Learning orthogonal transformation in 2000 PCA dimensions...
Got 20000 samples in A and 20000 in B.
PCA A...
PCA B...
Synchronizing...
Using skew-based logic for 1399/2000 dimensions.
PCA representations:  (20000, 2000) (20000, 2000) took: 68.09504985809326
Learning orthogonal transformation using matching sets:
Iter 0: 4191 B-NNs / 1210 consistent, mean NN l2 = 1308.520. took 2.88 sec.
Iter 1: 19634 B-NNs / 19634 consistent, mean NN l2 = 607.715. took 3.46 sec.
Iter 2: 19801 B-NNs / 19801 consistent, mean NN l2 = 204.487. took 3.49 sec.
Iter 3: 19801 B-NNs / 19801 consistent, mean NN l2 = 204.079. Converged - terminating ICP iterations.
Applying the learned transformation on test data...

Limitations

As described in the paper:

  • If the true translation is very non-linear, the learned linear transformation will not model it well.
  • If the image domain has a very complex structure, a large number of PCA coefficients will be required to achieve high quality reconstruction.
  • The nonmatching case (i.e. no matching paires exist) requires larger training sets.

Additional results

Paired

In the two examples above (edge images to real images and inpainting with a relative large part of the image missing), the true transformation is quite nonlinear, making the learned linear transformation less suitable. Here we used the unconstrained linear transformation rather than the orthogonal one. In addition, pairing supervision was used.

NonFaces

Here is an example showing the linear transformation method applied to a different domain (not just aligned faces).

Owner
Eitan Richardson
PhD student and TA at the Hebrew University of Jerusalem / Research Intern at Google
Eitan Richardson
Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Awesome AI books Some awesome AI related books and pdfs for downloading and learning. Preface This repo only used for learning, do not use in business

luckyzhou 1k Jan 01, 2023
End-to-end Temporal Action Detection with Transformer. [Under review]

TadTR: End-to-end Temporal Action Detection with Transformer By Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Song Bai, Xiang Bai. This repo holds the c

Xiaolong Liu 105 Dec 25, 2022
통일된 DataScience 폴더 구조 제공 및 가상환경 작업의 부담감 해소

Lucas coded by linux shell 목차 Mac버전 CookieCutter (autoenv) 1.How to Install autoenv 2.폴더 진입 시, activate 구현하기 3.폴더 탈출 시, deactivate 구현하기 4.Alias 설정하기 5

ello 3 Feb 21, 2022
Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

Katsuya Hyodo 8 Oct 13, 2022
This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

ISL This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation, which is accepted

19 May 04, 2022
RuleBERT: Teaching Soft Rules to Pre-Trained Language Models

RuleBERT: Teaching Soft Rules to Pre-Trained Language Models (Paper) (Slides) (Video) RuleBERT is a pre-trained language model that has been fine-tune

16 Aug 24, 2022
OMAMO: orthology-based model organism selection

OMAMO: orthology-based model organism selection OMAMO is a tool that suggests the best model organism to study a biological process based on orthologo

Dessimoz Lab 5 Apr 22, 2022
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

2.7k Jan 05, 2023
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

331 Dec 28, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video 📹 Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Bridge-damage-segmentation This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge c

Jingxiao Liu 5 Dec 07, 2022
PyTorch Implementation for Deep Metric Learning Pipelines

Easily Extendable Basic Deep Metric Learning Pipeline Karsten Roth ([email 

Karsten Roth 543 Jan 04, 2023
Newt - a Gaussian process library in JAX.

Newt __ \/_ (' \`\ _\, \ \\/ /`\/\ \\ \ \\

AaltoML 0 Nov 02, 2021
Snscrape-jsonl-urls-extractor - Extracts urls from jsonl produced by snscrape

snscrape-jsonl-urls-extractor extracts urls from jsonl produced by snscrape Usag

1 Feb 26, 2022
An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

犹在镜中 153 Dec 14, 2022
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
This code provides various models combining dilated convolutions with residual networks

Overview This code provides various models combining dilated convolutions with residual networks. Our models can achieve better performance with less

Fisher Yu 1.1k Dec 30, 2022
"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-

Khanh Nguyen 131 Oct 21, 2022
🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

🌎 JSONClasses JSONClasses is a declarative data flow pipeline and data graph framework. Official Website: https://www.jsonclasses.com Official Docume

Fillmula Inc. 53 Dec 09, 2022