CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

Overview

CoMoGAN: Continuous Model-guided Image-to-Image Translation

Official repository.

Paper

CoMoGAN

CoMoGAN

CoMoGAN: continuous model-guided image-to-image translation [arXiv] | [supp] | [teaser]
Fabio Pizzati, Pietro Cerri, Raoul de Charette
Inria, Vislab Ambarella. CVPR'21 (oral)

If you find our work useful, please cite:

@inproceedings{pizzati2021comogan,
  title={{CoMoGAN}: continuous model-guided image-to-image translation},
  author={Pizzati, Fabio and Cerri, Pietro and de Charette, Raoul},
  booktitle={CVPR},
  year={2021}
}

Prerequisites

Tested with:

  • Python 3.7
  • Pytorch 1.7.1
  • CUDA 11.0
  • Pytorch Lightning 1.1.8
  • waymo_open_dataset 1.3.0

Preparation

The repository contains training and inference code for CoMo-MUNIT training on waymo open dataset. In the paper, we refer to this experiment as Day2Timelapse. All the models have been trained on a 32GB Tesla V100 GPU. We also provide a mixed precision training which should fit smaller GPUs as well (a usual training takes ~9GB).

Environment setup

We advise the creation of a new conda environment including all necessary packages. The repository includes a requirements file. Please create and activate the new environment with

conda env create -f requirements.yml
conda activate comogan

Dataset preparation

First, download the Waymo Open Dataset from the official website. The dataset is organized in .tfrecord files, which we preprocess and split depending on metadata annotations on time of day. Once you downloaded the dataset, you should run the dump_waymo.py script. It will read and unpack the .tfrecord files, also resizing the images for training. Please run

python scripts/dump_waymo.py --load_path path/of/waymo/open/training --save_path /path/of/extracted/training/images
python scripts/dump_waymo.py --load_path path/of/waymo/open/validation --save_path /path/of/extracted/validation/images

Running those commands should result in a similar directory structure:

root
  training
    Day
      seq_code_0_im_code_0.png
      seq_code_0_im_code_1.png
      ...
      seq_code_1_im_code_0.png
      ...
  Dawn/Dusk
      ...
  Night
      ...
  validation
    Day
      ...
    Dawn/Dusk
      ...
    Night
      ...

Pretrained weights

We release a pretrained set of weights to allow reproducibility of our results. The weights are downloadable from here. Once downloaded, unpack the file in the root of the project and test them with the inference notebook.

Training

The training routine of CoMoGAN is mainly based on the CycleGAN codebase, available with details in the official repository.

To launch a default training, run

python train.py --path_data path/to/waymo/training/dir --gpus 0

You can choose on which GPUs to train with the --gpus flag. Multi-GPU is not deeply tested but it should be managed internally by Pytorch Lightning. Typically, a full training requires 13GB+ of GPU memory unless mixed precision is set. If you have a smaller GPU, please run

python train.py --path_data path/to/waymo/training/dir --gpus 0 --mixed_precision

Please note that performances on mixed precision trainings are evaluated only qualitatively.

Experiment organization

In the training routine, an unique ID will be assigned to every training. All experiments will be saved in the logs folder, which is structured in this way:

logs/
  train_ID_0
    tensorboard/default/version_0
      checkpoints
        model_35000.pth
        ...
      hparams.yaml
      tb_log_file
  train_ID_1
    ...

In the checkpoints folder, all the intermediate checkpoints will be stored. hparams.yaml contains all the hyperparameters for a given run. You can launch a tensorboard --logdir train_ID instance on training directories to visualize intermediate outputs and loss functions.

To resume a previously stopped training, running

python train.py --id train_ID --path_data path/to/waymo/training/dir --gpus 0

will load the latest checkpoint from a given train ID checkpoints directory.

Extending the code

Command line arguments

We expose command line arguments to encourage code reusability and adaptability to other datasets or models. Right now, the available options thought for extensions are:

  • --debug: Disables logging and experiment saving. Useful for testing code modifications.
  • --model: Loads a CoMoGAN model. By default, it loads CoMo-MUNIT (code is in networks folder)
  • --data_importer: Loads data from a dataset. By default, it loads waymo for the day2timelapse experiment (code is in data folder).
  • --learning_rate: Modifies learning rate, default value for CoMo-MUNIT is 1e-4.
  • --scheduler_policy: You can choose among linear os step policy, taken respectively from CycleGAN and MUNIT training routines. Default is step.
  • --decay_iters_step: For step policy, how many iterations before reducing learning rate
  • --decay_step_gamma: Regulates how much to reduce the learning rate
  • --seed: Random seed initialization

The codebase have been rewritten almost from scratch after CVPR acceptance and optimized for reproducibility, hence the seed provided could give slightly different results from the ones reported in the paper.

Changing model and dataset requires extending the networks/base_model.py and data/base_dataset.py class, respectively. Please look into CycleGAN repository for further instructions.

Model, dataset and other options

Specific hyperparameters for different models, datasets or options not changing with high frequency are embedded in munch dictionaries in the relative classes. For instance, in networks/comomunit_model.py you can find all customizable options for CoMo-MUNIT. The same is valid for data/day2timelapse_dataset.py. The options folder includes additional options on checkpoint saving intervals and logging.

Inference

Once you trained a model, you can use the infer.ipynb notebook to visualize translation results. After having launched a notebook instance, you will be required to select the train_id of the experiment. The notebook is documented and it provides widgets for sequence, checkpoint and translation selection.

You can also use the translate.py script to translate all the images inside a directory or a sequence of images to another target directory.

python scripts/translate.py --load_path path/to/waymo/validation/day/dir --save_path path/to/saving/dir --phi 3.14

Will load image from the indicated path before translating it to a night style image due to the phi set to 3.14.

  • --phi: (๐œ™) is the angle of the sun with a value between [0,2๐œ‹], which maps to a sun elevation โˆˆ [+30โ—ฆ,โˆ’40โ—ฆ]
  • --sequence: if you want to use only certain images, you can specify a name or a keyword contained in the image's name like --sequence segment-10203656353524179475
  • --checkpoint: if your folder logs contains more than one train_ID or if you want to select an older checkpoint, you should indicate the path to the checkpoint contained in the folder with the train_ID that you want like --checkpoint logs/train_ID_0/tensorboard/default/version_0/checkpoints/model_35000.pth

Docker

You will find a Dockerfile based on the nvidia/cuda:11.0.3-base-ubuntu18.04 image with all the dependencies that you need to run and test the code. To build it and to run it :

docker build -t notebook/comogan:1.0 .
docker run -it -v /path/to/your/local/datasets/:/datasets -p 8888:8888 --gpus '"device=0"' notebook/comogan:1.0
  • --gpus: gives you the possibility to only parse the GPU that you want to use, by default, all the available GPUs are parsed.
  • -v: mount the local directory that contained your dataset
  • -p: this option is only used for the infer.ipynb notebook. If you run the notebook on a remote server, you should also use this command to tunnel the output to your computer ssh [email protected] -NL 8888:127.0.0.1:8888
Owner
Codes from Computer Vision group of RITS Team, Inria
Codebase for Attentive Neural Hawkes Process (A-NHP) and Attentive Neural Datalog Through Time (A-NDTT)

Introduction Codebase for the paper Transformer Embeddings of Irregularly Spaced Events and Their Participants. This codebase contains two packages: a

Alan Yang 28 Dec 12, 2022
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place โ€” so you can focus on building the next big thing.

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu

Hao Mai 15 Nov 04, 2022
Scaling Vision with Sparse Mixture of Experts

Scaling Vision with Sparse Mixture of Experts This repository contains the code for training and fine-tuning Sparse MoE models for vision (V-MoE) on I

Google Research 290 Dec 25, 2022
A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Adversarial Video Generation This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Sc

Matt Cooper 704 Nov 26, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Exploring Simple 3D Multi-Object Tracking for

QCraft 141 Nov 21, 2022
PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

PyTorch Code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation Viraj Prabhu, Shivam Khare, Deeks

Viraj Prabhu 46 Dec 24, 2022
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

van_der_Schaar \LAB 89 Dec 07, 2022
A simple tutoral for error correction task, based on Pytorch

gramcorrector A simple tutoral for error correction task, based on Pytorch Grammatical Error Detection (sentence-level) a binary sequence-based classi

peiyuan_gong 8 Dec 03, 2022
Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

The Second Situated Interactive MultiModal Conversations (SIMMC 2.0) Challenge 2021 Welcome to the Second Situated Interactive Multimodal Conversation

Facebook Research 81 Nov 22, 2022
Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Conditional Generation Using Polynomial Expansions Official implementation of the conditional image generation experiments as described on the NeurIPS

Grigoris 4 Aug 07, 2022
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
Lab course materials for IEMBA 8/9 course "Coding and Artificial Intelligence"

IEMBA 8/9 - Coding and Artificial Intelligence Dear IEMBA 8/9 students, welcome to our IEMBA 8/9 elective course Coding and Artificial Intelligence, t

Artificial Intelligence & Machine Learning (AI:ML Lab) @ HSG 1 Jan 11, 2022
A Learning-based Camera Calibration Toolbox

Learning-based Camera Calibration A Learning-based Camera Calibration Toolbox Paper The pdf file can be found here. @misc{zhang2022learningbased,

Eason 14 Dec 21, 2022
Notepy is a full-featured Notepad Python app

Notepy A full featured python text-editor Notable features Autocompletion for parenthesis and quote Auto identation Syntax highlighting Compile and ru

Mirko Rovere 11 Sep 28, 2022
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
Image to Image translation, image generataton, few shot learning

Semi-supervised Learning for Few-shot Image-to-Image Translation [paper] Abstract: In the last few years, unpaired image-to-image translation has witn

yaxingwang 49 Nov 18, 2022
A Quick and Dirty Progressive Neural Network written in TensorFlow.

prog_nn .โ–„โ–„ ยท โ–„ยท โ–„โ–Œ โ– โ–„ โ–„โ–„โ–„ยท โ– โ–„ โ–โ–ˆ โ–€. โ–โ–ˆโ–ชโ–ˆโ–ˆโ–Œโ€ขโ–ˆโ–Œโ–โ–ˆโ–โ–ˆ โ–„โ–ˆโ–ช โ€ขโ–ˆโ–Œโ–โ–ˆ โ–„โ–€โ–€โ–€โ–ˆโ–„โ–โ–ˆโ–Œโ–โ–ˆโ–ชโ–โ–ˆโ–โ–โ–Œ โ–ˆโ–ˆโ–€

SynPon 53 Dec 12, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022