Official implementation of "A Shared Representation for Photorealistic Driving Simulators" in PyTorch.

Overview

A Shared Representation for Photorealistic Driving Simulators

The official code for the paper: "A Shared Representation for Photorealistic Driving Simulators" , paper, arXiv

A Shared Representation for Photorealistic Driving Simulators
Saeed Saadatnejad, Siyuan Li, Taylor Mordan, Alexandre Alahi, 2021. A powerful simulator highly decreases the need for real-world tests when training and evaluating autonomous vehicles. Data-driven simulators flourished with the recent advancement of conditional Generative Adversarial Networks (cGANs), providing high-fidelity images. The main challenge is synthesizing photo-realistic images while following given constraints. In this work, we propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We build on successful cGAN models to propose a new semantically-aware discriminator that better guides the generator. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning. The achieved improvements are generic and simple enough to be applied to any architecture of conditional image synthesis. We demonstrate the strength of our method on the scene, building, and human synthesis tasks across three different datasets.

Example

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

  1. Clone this repo.
git clone https://github.com/vita-epfl/SemDisc.git
cd ./SemDisc

Prerequisites

  1. Please install dependencies by
pip install -r requirements.txt

Dataset Preparation

  1. The cityscapes dataset can be downloaded from here: cityscapes

For the experiment, you will need to download [gtFine_trainvaltest.zip] and [leftImg8bit_trainvaltest.zip] and unzip them.

Training

After preparing all necessary environments and the dataset, activate your environment and start to train the network.

Training with the semantic-aware discriminator

The training is doen in two steps. First, the network is trained without only the adversarial head of D:

python train.py --name spade_semdisc --dataset_mode cityscapes --netG spade --c2f_sem_rec --normalize_smaps \
--checkpoints_dir <checkpoints path> --dataroot <data path> \
--lambda_seg 1 --lambda_rec 1 --lambda_GAN 35 --lambda_feat 10 --lambda_vgg 10 --fine_grained_scale 0.05 \
--niter_decay 0 --niter 100 \
--aspect_ratio 1 --load_size 256 --crop_size 256 --batchSize 16 --gpu_ids 0

After the network is trained for some epochs, we finetune it with the complete D:

python train.py --name spade_semdisc --dataset_mode cityscapes --netG spade --c2f_sem_rec --normalize_smaps \
--checkpoints_dir <checkpoints path> --dataroot <data path> \
--lambda_seg 1 --lambda_rec 1 --lambda_GAN 35 --lambda_feat 10 --lambda_vgg 10 --fine_grained_scale 0.05 \
--niter_decay 100 --niter 100 --continue_train --active_GSeg \
--aspect_ratio 1 --load_size 256 --crop_size 256 --batchSize 16 --gpu_ids 0

You can change netG to different options [spade, asapnets, pix2pixhd].

Training with original discriminator

The original model can be trained with the following command for comparison.

python train.py --name spade_orig --dataset_mode cityscapes --netG spade \
--checkpoints_dir <checkpoints path> --dataroot <data path> \
--niter_decay 100 --niter 100 --aspect_ratio 1 --load_size 256 --crop_size 256 --batchSize 16 --gpu_ids 0

Similarly, you can change netG to different options [spade, asapnets, pix2pixhd].

For now, only training on GPU is supported. In case of lack of space, try decreasing the batch size.

Test

Tests - image synthesis

After you have the trained networks, run the test as follows to get the synthesized images for both original and semdisc models

python test.py --name $name --dataset_mode cityscapes \
--checkpoints_dir <checkpoints path> --dataroot <data path> --results_dir ./results/ \
--which_epoch latest --aspect_ratio 1 --load_size 256 --crop_size 256 \
--netG spade --how_many 496

Tests - FID

For reporting FID scores, we leveraged fid-pytorch. To compute the score between two sets:

python fid/pytorch-fid/fid_score.py <GT_image path> <synthesized_image path> >> results/fid_$name.txt

Tests - segmentation

For reporting the segmentation scores, we used DRN. The pre-trained model (and some other details) can be found on this page. Follow the instructions on the DRN github page to setup Cityscapes.

You should have a main folder containing the drn/ folder (from github), the model .pth, the info.json, the val_images.txt and val_labels.txt, a 'labels' folder with the *_trainIds.png images, and a 'synthesized_image' folder with your *_leftImg8bit.png images.

The info.json is from the github, the val_images.txt and val_labels.txt can be obtained with the commands:

find labels/ -maxdepth 3 -name "*_trainIds.png" | sort > val_labels.txt
find synthesized_image/ -maxdepth 3 -name "*_leftImg8bit.png" | sort > val_images.txt

You also need to resize the label images to that size. You can do it with the convert command:

convert -sample 512X256\! "<Cityscapes val>/frankfurt/*_trainIds.png" -set filename:base "%[base]" "<path>/labels/%[filename:base].png"
convert -sample 512X256\! "<Cityscapes val>/lindau/*_trainIds.png" -set filename:base "%[base]" "<path>/labels/%[filename:base].png"
convert -sample 512X256\! "<Cityscapes val>/munster/*_trainIds.png" -set filename:base "%[base]" "<path>/labels/%[filename:base].png"

and the output of the models:

convert -sample 512X256\! "<Cityscapes test results path>/test_latest/images/synthesized_image/*.png" -set filename:base "%[base]" "synthesized_image/%[filename:base].png"

Then I run the model with:

cd drn/
python3 segment.py test -d ../ -c 19 --arch drn_d_105 --pretrained ../drn-d-105_ms_cityscapes.pth --phase val --batch-size 1 --ms >> ./results/seg_$name.txt

Acknowledgments

The base of the code is borrowed from SPADE. Please refer to SPADE to see the details.

Citation

@article{saadatnejad2021semdisc,
  author={Saadatnejad, Saeed and Li, Siyuan and Mordan, Taylor and Alahi, Alexandre},
  journal={IEEE Transactions on Intelligent Transportation Systems}, 
  title={A Shared Representation for Photorealistic Driving Simulators}, 
  year={2021},
  doi={10.1109/TITS.2021.3131303}
}
Owner
VITA lab at EPFL
Visual Intelligence for Transportation
VITA lab at EPFL
This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Colorizer The point of this project is to write a program capable of taking a black and white / grayscale image, and generating a realistic or plausib

Maitri Shah 1 Jan 06, 2022
The world's largest toxicity dataset.

The Toxicity Dataset by Surge AI Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's wh

Surge AI 134 Dec 19, 2022
For visualizing the dair-v2x-i dataset

3D Detection & Tracking Viewer The project is based on hailanyi/3D-Detection-Tracking-Viewer and is modified, you can find the original version of the

34 Dec 29, 2022
Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation

Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation This is the official repository for our paper Neural Reprojection Error

Hugo Germain 78 Dec 01, 2022
Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

MINDs Lab 104 Nov 29, 2022
Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

EMS-COLS-recourse Initial Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions Folder structure: data folder contains raw an

Prateek Yadav 1 Nov 25, 2022
Julia package for multiway (inverse) covariance estimation.

TensorGraphicalModels TensorGraphicalModels.jl is a suite of Julia tools for estimating high-dimensional multiway (tensor-variate) covariance and inve

Wayne Wang 3 Sep 23, 2022
Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Caffe SegNet This is a modified version of Caffe which supports the SegNet architecture As described in SegNet: A Deep Convolutional Encoder-Decoder A

Alex Kendall 1.1k Jan 02, 2023
Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences 1. Introduction This project is for paper Model-free Vehicle Tracking and St

TuSimple 92 Jan 03, 2023
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 2k Jan 09, 2023
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 04, 2022
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

This repository contains a collection of resources and papers on Diffusion Models and Score-based Models. If there are any missing valuable resources

5.1k Jan 08, 2023
A machine learning malware analysis framework for Android apps.

🕵️ A machine learning malware analysis framework for Android apps. ☢️ DroidDetective is a Python tool for analysing Android applications (APKs) for p

James Stevenson 77 Dec 27, 2022
Official code repository for the publication "Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons"

Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons This repository contains the code to repr

Computational Neuroscience, University of Bern 3 Aug 04, 2022
Robust fine-tuning of zero-shot models

Robust fine-tuning of zero-shot models This repository contains code for the paper Robust fine-tuning of zero-shot models by Mitchell Wortsman*, Gabri

224 Dec 29, 2022
Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

Sayan Nath 8 Oct 03, 2022
Data Preparation, Processing, and Visualization for MoVi Data

MoVi-Toolbox Data Preparation, Processing, and Visualization for MoVi Data, https://www.biomotionlab.ca/movi/ MoVi is a large multipurpose dataset of

Saeed Ghorbani 51 Nov 27, 2022
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022
MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

MarcoPolo is a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Overview MarcoPolo

Chanwoo Kim 13 Dec 18, 2022