Geometry-Free View Synthesis: Transformers and no 3D Priors

Overview

Geometry-Free View Synthesis: Transformers and no 3D Priors

teaser

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*, Patrick Esser*, Björn Ommer
* equal contribution

arXiv | BibTeX | Colab

Interactive Scene Exploration Results

RealEstate10K:
realestate
Videos: short (2min) / long (12min)

ACID:
acid
Videos: short (2min) / long (9min)

Demo

For a quickstart, you can try the Colab demo, but for a smoother experience we recommend installing the local demo as described below.

Installation

The demo requires building a PyTorch extension. If you have a sane development environment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, you can also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesis
conda env create -f geometry-free-view-synthesis/environment.yaml
conda activate geofree
pip install geometry-free-view-synthesis/

Running

After installation, running

braindance.py

will start the demo on a sample scene. Explore the scene interactively using the WASD keys to move and arrow keys to look around. Once positioned, hit the space bar to render the novel view with GeoGPT.

You can move again with WASD keys. Mouse control can be activated with the m key. Run braindance.py to run the demo on your own images. By default, it uses the re-impl-nodepth (trained on RealEstate without explicit transformation and no depth input) which can be changed with the --model flag. The corresponding checkpoints will be downloaded the first time they are required. Specify an output path using --video path/to/vid.mp4 to record a video.

> braindance.py -h
usage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]

What's up, BD-maniacs?

key(s)       action                  
=====================================
wasd         move around             
arrows       look around             
m            enable looking with mouse
space        render with transformer 
q            quit                    

positional arguments:
  path                  path to image or directory from which to select image. Default example is used if not specified.

optional arguments:
  -h, --help            show this help message and exit
  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}
                        pretrained model to use.
  --video [VIDEO]       path to write video recording to. (no recording if unspecified).

Training

Data Preparation

We support training on RealEstate10K and ACID. Both come in the same format as described here and the preparation is the same for both of them. You will need to have colmap installed and available on your $PATH.

We assume that you have extracted the .txt files of the dataset you want to prepare into $TXT_ROOT, e.g. for RealEstate:

> tree $TXT_ROOT
├── test
│   ├── 000c3ab189999a83.txt
│   ├── ...
│   └── fff9864727c42c80.txt
└── train
    ├── 0000cc6d8b108390.txt
    ├── ...
    └── ffffe622a4de5489.txt

and that you have downloaded the frames (we downloaded them in resolution 640 x 360) into $IMG_ROOT, e.g. for RealEstate:

> tree $IMG_ROOT
├── test
│   ├── 000c3ab189999a83
│   │   ├── 45979267.png
│   │   ├── ...
│   │   └── 55255200.png
│   ├── ...
│   ├── 0017ce4c6a39d122
│   │   ├── 40874000.png
│   │   ├── ...
│   │   └── 48482000.png
├── train
│   ├── ...

To prepare the $SPLIT split of the dataset ($SPLIT being one of train, test for RealEstate and train, test, validation for ACID) in $SPA_ROOT, run the following within the scripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply set TXT_ROOT, IMG_ROOT and SPA_ROOT as environment variables and run ./sparsify_realestate.sh or ./sparsify_acid.sh. Take a look into the sources to run with multiple workers in parallel.

Finally, symlink $SPA_ROOT to data/realestate_sparse/data/acid_sparse.

First Stage Models

As described in our paper, we train the transformer models in a compressed, discrete latent space of pretrained VQGANs. These pretrained models can be conveniently downloaded by running

python scripts/download_vqmodels.py 

which will also create symlinks ensuring that the paths specified in the training configs (see configs/*) exist. In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to the taming transformers repository.

Running the Training

After both the preparation of the data and the first stage models are done, the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running

python geofree/main.py --base configs//_13x23_.yaml -t --gpus 0,

where is one of realestate/acid and is one of expl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid. These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

variants

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree,
      title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, 
      author={Robin Rombach and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2104.07652},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit 👉 Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
DeepDiffusion: Unsupervised Learning of Retrieval-adapted Representations via Diffusion-based Ranking on Latent Feature Manifold

DeepDiffusion Introduction This repository provides the code of the DeepDiffusion algorithm for unsupervised learning of retrieval-adapted representat

4 Nov 15, 2022
MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

This repository is the official PyTorch implementation of Meta-Balance. Find the paper on arxiv MetaBalance: High-Performance Neural Networks for Clas

Arpit Bansal 20 Oct 18, 2021
Streamlit tool to explore coco datasets

What is this This tool given a COCO annotations file and COCO predictions file will let you explore your dataset, visualize results and calculate impo

Jakub Cieslik 75 Dec 16, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
Dilated Convolution with Learnable Spacings PyTorch

Dilated-Convolution-with-Learnable-Spacings-PyTorch Ismail Khalfaoui Hassani Dilated Convolution with Learnable Spacings (abbreviated to DCLS) is a no

15 Dec 09, 2022
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

Phil Wang 71 Dec 01, 2022
Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

hCaptcha Challenger 🚀 Gracefully face hCaptcha challenge with Yolov5(ONNX) embe

593 Jan 03, 2023
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
This repo contains the official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis This repo contains the official implementations of EigenDamage: Structured Prunin

Chaoqi Wang 107 Apr 20, 2022
Immortal tracker

Immortal_tracker Prerequisite Our code is tested for Python 3.6. To install required liabraries: pip install -r requirements.txt Waymo Open Dataset P

74 Dec 03, 2022
Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Moer Grounded Image Captioning by Distilling Image-Text Matching Model Requirements Python 3.7 Pytorch 1.2 Prepare data Please use git clone --recurse

YE Zhou 60 Dec 16, 2022
Motion and Shape Capture from Sparse Markers

MoSh++ This repository contains the official chumpy implementation of mocap body solver used for AMASS: AMASS: Archive of Motion Capture as Surface Sh

Nima Ghorbani 135 Dec 23, 2022
PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Future urban scene generation through vehicle synthesis This repository contains Pytorch code for the ICPR2020 paper "Future Urban Scene Generation Th

Alessandro Simoni 4 Oct 11, 2021
Goal of the project : Detecting Temporal Boundaries in Sign Language videos

MVA RecVis course final project : Goal of the project : Detecting Temporal Boundaries in Sign Language videos. Sign language automatic indexing is an

Loubna Ben Allal 6 Dec 21, 2022
Evolving neural network parameters in JAX.

Evolving Neural Networks in JAX This repository holds code displaying techniques for applying evolutionary network training strategies in JAX. Each sc

Trevor Thackston 6 Feb 12, 2022
[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

Jongmin Lee 17 Nov 10, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs This is the official code for Towards Multi-Grained Explainability for Graph Neural Networks (NeurIPS 20

Shirley (Ying-Xin) Wu 47 Dec 16, 2022