ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

Overview

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

Official implementation of ViewFormer. ViewFormer is a NeRF-free neural rendering model based on the transformer architecture. The model is capable of both novel view synthesis and camera pose estimation. It is evaluated on previously unseen 3D scenes.

Paper    Web    Demo


Open In Colab Python Versions

Getting started

Start by creating a python 3.8 venv. From the activated environment, you can run the following command in the directory containing setup.py:

pip install -e .

Getting datasets

In this section, we describe how you can prepare the data for training. We assume that you have your environment ready and you want to store the dataset into {output path} directory.

Shepard-Metzler-Parts-7

Please, first visit https://github.com/deepmind/gqn-datasets.

viewformer-cli dataset generate \
    --loader sm7 \
    --image-size 128 \
    --output {output path}/sm7 \
    --max-sequences-per-shard 2000 \
    --split train

viewformer-cli dataset generate \
    --loader sm7 \
    --image-size 128 \
    --output {output path}/sm7 \
    --max-sequences-per-shard 2000 \
    --split test

InteriorNet

Download the dataset into the directory {source} by following the instruction here: https://interiornet.org/. Then, proceed as follows:

viewformer-cli dataset generate \
    --loader interiornet \
    --path {source} \
    --image-size 128  \
    --output {output path}/interiornet \
    --max-sequences-per-shard 50 \
    --shuffle \
    --split train

viewformer-cli dataset generate \
    --loader interiornet \
    --path {source} \
    --image-size 128  \
    --output {output path}/interiornet \
    --max-sequences-per-shard 50 \
    --shuffle \
    --split test

Common Objects in 3D

Download the dataset into the directory {source} by following the instruction here: https://ai.facebook.com/datasets/CO3D-dataset.

Install the following dependencies: plyfile>=0.7.4 pytorch3d. Then, generate the dataset for 10 categories as follows:

viewformer-cli dataset generate \
    --loader co3d \
    --path {source} \
    --image-size 128  \
    --output {output path}/co3d \
    --max-images-per-shard 6000 \
    --shuffle \
    --categories "plant,teddybear,suitcase,bench,ball,cake,vase,hydrant,apple,donut" \
    --split train

viewformer-cli dataset generate \
    --loader co3d \
    --path {source} \
    --image-size 128  \
    --output {output path}/co3d \
    --max-images-per-shard 6000 \
    --shuffle \
    --categories "plant,teddybear,suitcase,bench,ball,cake,vase,hydrant,apple,donut" \
    --split val

Alternatively, generate the full dataset as follows:

viewformer-cli dataset generate \
    --loader co3d \
    --path {source} \
    --image-size 128  \
    --output {output path}/co3d \
    --max-images-per-shard 6000 \
    --shuffle \
    --split train

viewformer-cli dataset generate \
    --loader co3d \
    --path {source} \
    --image-size 128  \
    --output {output path}/co3d \
    --max-images-per-shard 6000 \
    --shuffle \
    --split val

ShapeNet cars and chairs dataset

Download and extract the SRN datasets into the directory {source}. The files can be found here: https://drive.google.com/drive/folders/1OkYgeRcIcLOFu1ft5mRODWNQaPJ0ps90.

Then, generate the dataset as follows:

viewformer-cli dataset generate \
    --loader shapenet \
    --path {source} \
    --image-size 128  \
    --output {output path}/shapenet-{category}/shapenet \
    --categories {category} \
    --max-sequences-per-shard 50 \
    --shuffle \
    --split train

viewformer-cli dataset generate \
    --loader shapenet \
    --path {source} \
    --image-size 128  \
    --output {output path}/shapenet-{category}/shapenet \
    --categories {category} \
    --max-sequences-per-shard 50 \
    --shuffle \
    --split test

where {category} is either cars or chairs.

Faster preprocessing

In order to make the preprocessing faster, you can add --shards {process id}/{num processes} to the command and run multiple instances of the command in multiple processes.

Training the codebook model

The codebook model training uses the PyTorch framework, but the resulting model can be loaded by both TensorFlow and PyTorch. The training code was also prepared for TensorFlow framework, but in order to get the same results as published in the paper, PyTorch code should be used. To train the codebook model on 8 GPUs, run the following code:

viewformer-cli train codebook \
    --job-dir . \
    --dataset "{dataset path}" \
    --num-gpus 8 \
    --batch-size 352 \
    --n-embed 1024 \
    --learning-rate 1.584e-3 \
    --total-steps 200000

Replace {dataset path} by the real dataset path. Note that you can use more than one dataset. In that case, the dataset paths should be separated by a comma. Also, if the size of dataset is not large enough to support sharding, you can reduce the number of data loading workers by using --num-val-workers and --num-workers arguments. The argument --job-dir specifies the path where the resulting model and logs will be stored. You can also use the --wandb flag, that enables logging to wandb.

Finetuning the codebook model

If you want to finetune an existing codebook model, add --resume-from-checkpoint "{checkpoint path}" to the command and increase the number of total steps.

Transforming the dataset into the code representation

Before the transformer model can be trained, the dataset has to be transformed into the code representation. This can be achieved by running the following command (on a single GPU):

viewformer-cli generate-codes \
    --model "{codebook model checkpoint}" \
    --dataset "{dataset path}" \
    --output "{code dataset path}" \
    --batch-size 64 

We assume that the codebook model checkpoint path (ending with .ckpt) is {codebook model checkpoint} and the original dataset is stored in {dataset path}. The resulting dataset will be stored in {code dataset path}.

Training the transformer model

To train the models with the same hyper-parameters as in the paper, run the commands from the following sections based on the target dataset. We assume that the codebook model checkpoint path (ending with .ckpt) is {codebook model checkpoint} and the associated code dataset is located in {code dataset path}. All commands use 8 GPUs (in our case 8 NVIDIA A100 GPUs).

InteriorNet training

viewformer-cli train transformer \
    --dataset "{code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --sequence-size 20 \
    --n-loss-skip 4 \
    --batch-size 40 \
    --fp16 \
    --total-steps 200000 \
    --localization-weight 5. \
    --learning-rate 8e-5 \
    --weight-decay 0.01 \
    --job-dir . \
    --pose-multiplier 1.

For the variant without localization, use --localization-weight 0. Similarly, for the variant without novel view synthesis, use --image-generation-weight 0.

CO3D finetuning

In order to finetune the model for 10 categories, use the following command:

viewformer-cli train finetune-transformer \
    --dataset "{code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --sequence-size 10 \
    --n-loss-skip 1 \
    --batch-size 80 \
    --fp16 \
    --localization-weight 5 \
    --learning-rate 1e-4 \
    --total-steps 40000 \
    --epochs 40 \
    --weight-decay 0.05 \
    --job-dir . \
    --pose-multiplier 0.05 \
    --checkpoint "{interiornet transformer model checkpoint}"

Here {interiornet transformer model checkpoint} is the path to the InteriorNet checkpoint (usually ending with weights.model.099-last). For the variant without localization, use --localization-weight 0.

For all categories and including localization:

viewformer-cli train finetune-transformer \
    --dataset "{code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --sequence-size 10 \
    --n-loss-skip 1 \
    --batch-size 40 \
    --localization-weight 5 \
    --gradient-clip-val 1. \
    --learning-rate 1e-4 \
    --total-steps 100000 \
    --epochs 100 \
    --weight-decay 0.05 \
    --job-dir . \
    --pose-multiplier 0.05 \
    --checkpoint "{interiornet transformer model checkpoint}"

Here {interiornet transformer model checkpoint} is the path to the InteriorNet checkpoint (usually ending with weights.model.099-last).

For all categories without localization:

viewformer-cli train finetune-transformer \
    --dataset "{code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --sequence-size 10 \
    --n-loss-skip 1 \
    --batch-size 40 \
    --localization-weight 5 \
    --learning-rate 1e-4 \
    --total-steps 100000 \
    --epochs 100 \
    --weight-decay 0.05 \
    --job-dir . \
    --pose-multiplier 0.05 \
    --checkpoint "{interiornet transformer model checkpoint}"

Here {interiornet transformer model checkpoint} is the path to the InteriorNet checkpoint (usually ending with weights.model.099-last).

7-Scenes finetuning

viewformer-cli train finetune-transformer \
    --dataset "{code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --localization-weight 5 \
    --pose-multiplier 5. \
    --batch-size 40 \
    --fp16 \
    --learning-rate 1e-5 \
    --job-dir .  \
    --total-steps 10000 \
    --epochs 10 \
    --checkpoint "{interiornet transformer model checkpoint}"

Here {interiornet transformer model checkpoint} is the path to the InteriorNet checkpoint (usually ending with weights.model.099-last).

ShapeNet finetuning

viewformer-cli train finetune-transformer \
    --dataset "{cars code dataset path},{chairs code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --localization-weight 1 \
    --pose-multiplier 1 \
    --n-loss-skip 1 \
    --sequence-size 4 \
    --batch-size 64 \
    --learning-rate 1e-4 \
    --gradient-clip-val 1 \
    --job-dir .  \
    --total-steps 100000 \
    --epochs 100 \
    --weight-decay 0.05 \
    --checkpoint "{interiornet transformer model checkpoint}"

Here {interiornet transformer model checkpoint} is the path to the InteriorNet checkpoint (usually ending with weights.model.099-last).

SM7 training

viewformer-cli train transformer \
    --dataset "{code dataset path}" \
    --codebook-model "{codebook model checkpoint}" \
    --sequence-size 6 \
    --n-loss-skip 1 \
    --batch-size 128 \
    --fp16 \
    --total-steps 120000 \
    --localization-weight "cosine(0,1,120000)" \
    --learning-rate 1e-4 \
    --weight-decay 0.01 \
    --job-dir . \
    --pose-multiplier 0.2

You can safely replace the cosine schedule for localization weight with a constant term.

Evaluation

Codebook evaluation

In order to evaluate the codebook model, run the following:

viewformer-cli evaluate codebook \
    --codebook-model "{codebook model checkpoint}" \
    --loader-path "{dataset path}" \
    --loader dataset \
    --loader-split test \
    --batch-size 64 \
    --image-size 128 \
    --num-store-images 0 \
    --num-eval-images 1000 \
    --job-dir . 

Note that --image-size argument controls the image size used for computing the metrics. You can change it to a different value.

General transformer evaluation

In order to evaluate the transformer model, run the following:

viewformer-cli evaluate transformer \
    --codebook-model "{codebook model checkpoint}" \
    --transformer-model "{transformer model checkpoint}" \
    --loader-path "{dataset path}" \
    --loader dataset \
    --loader-split test \
    --batch-size 1 \
    --image-size 128 \
    --job-dir . \
    --num-eval-sequences 1000

Optionally, you can use --sequence-size to control the context size used for evaluation. Note that --image-size argument controls the image size used for computing the metrics. You can change it to a different value.

Transformer evaluation with different context sizes

In order to evaluate the transformer model with multiple context sizes, run the following:

viewformer-cli evaluate transformer-multictx \
    --codebook-model "{codebook model checkpoint}" \
    --transformer-model "{transformer model checkpoint}" \
    --loader-path "{dataset path}" \
    --loader dataset \
    --loader-split test \
    --batch-size 1 \
    --image-size 128 \
    --job-dir . \
    --num-eval-sequences 1000

Note that --image-size argument controls the image size used for computing the metrics. You can change it to a different value.

CO3D evaluation

In order to evaluate the transformer model on the CO3D dataset, run the following:

viewformer-cli evaluate \
    --codebook-model "{codebook model checkpoint}" \
    --transformer-model "{transformer model checkpoint}" \
    --path {original CO3D root}
    --job-dir . 

7-Scenes evaluation

In order to evaluate the transformer model on the 7-Scenes dataset, run the following:

viewformer-cli evaluate 7scenes \
    --codebook-model "{codebook model checkpoint}" \
    --transformer-model "{transformer model checkpoint}" \
    --path {original 7-Scenes root}
    --batch-size 1
    --job-dir .
    --num-store-images 0
    --top-n-matched-images 10
    --image-match-map {path to top10 matched images}

You can change --top-n-matched-images to 0 if you don't want to use top 10 closest images in the context. {path to top10 matched images} as a path to the file containing the map between most similar images from the test and the train sets. Each line is in the format {relative test image path} {relative train image path}.

Thanks

We would like to express our sincere gratitude to the authors of the following repositories, that we used in our code:

Owner
Jonáš Kulhánek
Jonáš Kulhánek
Face Recognize System on camera AI OAK1

FRS on OAK1 Face Recognize System on camera OAK1 This project contains our work that deploy on camera OAK1 Features Anti-Spoofing Face detection Face

Tran Anh Tuan 6 Aug 08, 2022
Neural Oblivious Decision Ensembles

Neural Oblivious Decision Ensembles A supplementary code for anonymous ICLR 2020 submission. What does it do? It learns deep ensembles of oblivious di

25 Sep 21, 2022
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

3DIAS_Pytorch This repository contains the official code to reproduce the results from the paper: 3DIAS: 3D Shape Reconstruction with Implicit Algebra

Mohsen Yavartanoo 21 Dec 12, 2022
Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Line-level Handwritten Text Recognition with TensorFlow This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and

Hoàng Tùng Lâm (Linus) 72 May 07, 2022
This is an example of a reproducible modelling project

An example of a reproducible modelling project What are we doing? This example was created for the 2021 fall lecture series of Stanford's Center for O

Armin Thomas 2 Oct 26, 2021
Codebase for Time-series Generative Adversarial Networks (TimeGAN)

Codebase for Time-series Generative Adversarial Networks (TimeGAN)

Jinsung Yoon 532 Dec 31, 2022
BERTMap: A BERT-Based Ontology Alignment System

BERTMap: A BERT-based Ontology Alignment System Important Notices The relevant paper was accepted in AAAI-2022. Arxiv version is available at: https:/

KRR 36 Dec 24, 2022
The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

ReCo - Regional Contrast This repository contains the source code of ReCo and baselines from the paper, Bootstrapping Semantic Segmentation with Regio

Shikun Liu 128 Dec 30, 2022
Python Multi-Agent Reinforcement Learning framework

- Please pay attention to the version of SC2 you are using for your experiments. - Performance is *not* always comparable between versions. - The re

whirl 1.3k Jan 05, 2023
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

OpenMMLab 3k Jan 07, 2023
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
Super Resolution for images using deep learning.

Neural Enhance Example #1 — Old Station: view comparison in 24-bit HD, original photo CC-BY-SA @siv-athens. As seen on TV! What if you could increase

Alex J. Champandard 11.7k Dec 29, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Level Based Customer Segmentation

level_based_customer_segmentation Level Based Customer Segmentation Persona Veri Seti kullanılarak müşteri segmentasyonu yapılmıştır. KOLONLAR : PRICE

Buse Yıldırım 6 Dec 21, 2021
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Period-alternatives-of-Softmax Experimental Demo for our paper 'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechani

slwang9353 0 Sep 06, 2021
Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Implicit-Semantic-Response-Alignment Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation" Prerequisites pyt

4 Dec 19, 2022
DAN: Unfolding the Alternating Optimization for Blind Super Resolution

DAN-Basd-on-Openmmlab DAN: Unfolding the Alternating Optimization for Blind Super Resolution We reproduce DAN via mmediting based on open-sourced code

AlexZou 72 Dec 13, 2022
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 269 Jan 02, 2023
Optimus: the first large-scale pre-trained VAE language model

Optimus: the first pre-trained Big VAE language model This repository contains source code necessary to reproduce the results presented in the EMNLP 2

314 Dec 19, 2022