Semantic Bottleneck Scene Generation

Related tags

Deep LearningSB-GAN
Overview

SB-GAN

Semantic Bottleneck Scene Generation

Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure. During inference, our model first synthesizes a realistic segmentation layout from scratch, then synthesizes a realistic scene conditioned on that layout. For the former, we use an unconditional progressive segmentation generation network that captures the distribution of realistic semantic scene layouts. For the latter, we use a conditional segmentation-to-image synthesis network that captures the distribution of photo-realistic images conditioned on the semantic layout. When trained end-to-end, the resulting model outperforms state-of-the-art generative models in unsupervised image synthesis on two challenging domains in terms of the Frechet Inception Distance and user-study evaluations. Moreover, we demonstrate the generated segmentation maps can be used as additional training data to strongly improve recent segmentation-to-image synthesis networks.

Paper

[Paper 3.5MB]  [arXiv]

Code

Prerequisites:

  • NVIDIA GPU + CUDA CuDNN
  • Python 3.6
  • PyTorch 1.0
  • Please install dependencies by
pip install -r requirements.txt

Preparation

  • Clone this repo with its submodules
git clone --recurse-submodules -j8 https://github.com/azadis/SB-GAN.git
cd SB-GAN/SPADE/models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../../../

Datasets

ADE-Indoor

  • To have access to the indoor images from the ADE20K dataset and their corresponding segmentation maps used in our paper:
cd SB-GAN
bash SBGAN/datasets/download_ade.sh
cd ..

Cityscapes

cd SB-GAN/SBGAN/datasets
mkdir cityscapes
cd cityscapes
  • Download and unzip leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip from the Cityscapes webpage .
mv leftImg8bit_trainvaltest/leftImg8bit ./
mv gtFine_trainvaltest/gtFine ./

Cityscapes-25k

  • In addition to the 5K portion already downloaded, download and unzip leftImg8bit_trainextra.zip. You can have access to the fine annotations of these 20K images we used in our paper by:
wget https://people.eecs.berkeley.edu/~sazadi/SBGAN/datasets/drn_d_105_000_test.tar.gz
tar -xzvf drn_d_105_000_test.tar.gz

These annotations are predicted by a DRN trained on the 5K fine-annotated portion of Cityscapes with 19 semantic categories. The new fine annotations of the 5K portion with 19 semantic classes can be also downloaded by:

wget https://people.eecs.berkeley.edu/~sazadi/SBGAN/datasets/gtFine_new.tar.gz
tar -xzvf gtFine_new.tar.gz
cd ../../../..

Training

cd SB-GAN/SBGAN

  • On each $dataset in ade_indoor, cityscapes, cityscapes_25k:
  1. Semantic bottleneck synthesis:
bash SBGAN/scipts/$dataset/train_progressive_seg.sh
  1. Semantic image synthesis:
cd ../SPADE
bash scripts/$dataset/train_spade.sh
  1. Train the end2end SBGAN model:
cd ../SBGAN
bash SBGAN/scripts/$dataset/train_finetune_end2end.sh
  • In the above script, set $pro_iter to the iteration number of the checkpoint saved from step 1 that you want to use before fine-tuning. Also, set $spade_epoch to the last epoch saved for SPADE from step 2.
  • To visualize the training you have started in steps 1 and 3 on a ${date-time}, run the following commands. Then, open http://localhost:6006/ on your web browser.
cd SBGAN/logs/${date-time}
tensorboard --logdir=. --port=6006

Testing

To compute FID after training the end2end model, for each $dataset, do:

bash SBGAN/scripts/$dataset/test_finetune_end2end.sh
  • In the above script, set $pro_iter and $spade_epoch to the appropriate checkpoints saved from your end2end training.

Citation

If you use this code, please cite our paper:

@article{azadi2019semantic,
  title={Semantic Bottleneck Scene Generation},
  author={Azadi, Samaneh and Tschannen, Michael and Tzeng, Eric and Gelly, Sylvain and Darrell, Trevor and Lucic, Mario},
  journal={arXiv preprint arXiv:1911.11357},
  year={2019}
}
Owner
Samaneh Azadi
CS PhD student at UC Berkeley
Samaneh Azadi
Self-Supervised Learning with Kernel Dependence Maximization

Self-Supervised Learning with Kernel Dependence Maximization This is the code for SSL-HSIC, a self-supervised learning loss proposed in the paper Self

DeepMind 29 Dec 29, 2022
Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

SinGAN Project | Arxiv | CVF | Supplementary materials | Talk (ICCV`19) Official pytorch implementation of the paper: "SinGAN: Learning a Generative M

Tamar Rott Shaham 3.2k Dec 25, 2022
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

SwinTransformer + OBBDet The sixth place winning solution (6/220) in the track of Fine-grained Object Recognition in High-Resolution Optical Images, 2

ming71 46 Dec 02, 2022
Tutorial repo for an end-to-end Data Science project

End-to-end Data Science project This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the session

Deena Gergis 127 Dec 30, 2022
Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

Facebook Research 69 Dec 29, 2022
Supplementary materials for ISMIR 2021 LBD paper "Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes"

Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Supplementary materials for ISMIR 2021 LBD submission: K. N. W

Karn Watcharasupat 2 Oct 25, 2021
This repository contains code from the paper "TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network"

TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network This repository contains code from the paper "TTS-GAN: A Transformer-based Tim

Intelligent Multimodal Computing and Sensing Laboratory (IMICS Lab) - Texas State University 108 Dec 29, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

9.6k Jan 06, 2023
Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Multi-band Spectro Radiomertric Image Analysis with K-means Cluster Algorithm Overview Multi-band Spectro Radiomertric images are images comprising of

Chibueze Henry 6 Mar 16, 2022
How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

LETGAN How to Learn a Domain Adaptive Event Simulator? ACM MM 2021 Running Environment: pytorch=1.4, 1 NVIDIA-1080TI. More details can be found in pap

CVTEAM 4 Sep 20, 2022
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

Appen Repos 86 Dec 07, 2022
PyTorch implementation of PSPNet segmentation network

pspnet-pytorch PyTorch implementation of PSPNet segmentation network Original paper Pyramid Scene Parsing Network Details This is a slightly different

Roman Trusov 532 Dec 29, 2022
10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Under refactoring 10th place solution for Google Smartphone Decimeter Challenge at kaggle. Google Smartphone Decimeter Challenge Global Navigation Sat

12 Oct 25, 2022
Wordle-solver - Wordle answer generation program in python

🟨 Wordle Solver 🟩 Wordle answer generation program in python ✔️ Requirements U

Dahyun Kang 4 May 28, 2022
Spatial Contrastive Learning for Few-Shot Classification (SCL)

This repo contains the official implementation of Spatial Contrastive Learning for Few-Shot Classification (SCL), which presents of a novel contrastive learning method applied to few-shot image class

Yassine 34 Dec 25, 2022
This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

YecheolKim 97 Dec 20, 2022
To SMOTE, or not to SMOTE?

To SMOTE, or not to SMOTE? This package includes the code required to repeat the experiments in the paper and to analyze the results. To SMOTE, or not

Amazon Web Services 1 Jan 03, 2022
Intro-to-dl - Resources for "Introduction to Deep Learning" course.

Introduction to Deep Learning course resources https://www.coursera.org/learn/intro-to-deep-learning Running on Google Colab (tested for all weeks) Go

Advanced Machine Learning specialisation by HSE 761 Dec 24, 2022