PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

Related tags

Deep LearningIBRNet
Overview

IBRNet: Learning Multi-View Image-Based Rendering

PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

IBRNet: Learning Multi-View Image-Based Rendering
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
CVPR 2021

project page | paper | data & model

Demo

Installation

Clone this repo with submodules:

git clone --recurse-submodules https://github.com/googleinterns/IBRNet
cd IBRNet/

The code is tested with Python3.7, PyTorch == 1.5 and CUDA == 10.2. We recommend you to use anaconda to make sure that all dependencies are in place. To create an anaconda environment:

conda env create -f environment.yml
conda activate ibrnet

Datasets

1. Training datasets

├──data/
    ├──ibrnet_collected_1/
    ├──ibrnet_collected_2/
    ├──real_iconic_noface/
    ├──spaces_dataset/
    ├──RealEstate10K-subset/
    ├──google_scanned_objects/

Please first cd data/, and then download datasets into data/ following the instructions below. The organization of the datasets should be the same as above.

(a) Our captures

We captured 67 forward-facing scenes (each scene contains 20-60 images). To download our data ibrnet_collected.zip (4.1G) for training, run:

gdown https://drive.google.com/uc?id=1rkzl3ecL3H0Xxf5WTyc2Swv30RIyr1R_
unzip ibrnet_collected.zip

P.S. We've captured some more scenes in ibrnet_collected_more.zip, but we didn't include them for training. Feel free to download them if you would like more scenes for your task, but you wouldn't need them to reproduce our results.

(b) LLFF released scenes

Download and process real_iconic_noface.zip (6.6G) using the following commands:

# download 
gdown https://drive.google.com/uc?id=1ThgjloNt58ZdnEuiCeRf9tATJ-HI0b01
unzip real_iconic_noface.zip

# [IMPORTANT] remove scenes that appear in the test set
cd real_iconic_noface/
rm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower
cd ../

(c) Spaces Dataset

Download spaces dataset by:

git clone https://github.com/augmentedperception/spaces_dataset

(d) RealEstate10K

The full RealEstate10K dataset is very large and can be difficult to download. Hence, we provide a subset of RealEstate10K training scenes containing only 200 scenes. In our experiment, we found using more scenes from RealEstate10K only provides marginal improvement. To download our camera files (2MB):

gdown https://drive.google.com/uc?id=1IgJIeCPPZ8UZ529rN8dw9ihNi1E9K0hL
unzip RealEstate10K_train_cameras_200.zip -d RealEstate10K-subset

Besides the camera files, you also need to download the corresponding video frames from YouTube. You can download the frames (29G) by running the following commands. The script uses ffmpeg to extract frames, so please make sure you have ffmpeg installed.

git clone https://github.com/qianqianwang68/RealEstate10K_Downloader
cd RealEstate10K_Downloader
python generate_dataset.py train
cd ../

(e) Google Scanned Objects

Google Scanned Objects contain 1032 diffuse objects with various shapes and appearances. We use gaps to render these objects for training. Each object is rendered at 512 × 512 pixels from viewpoints on a quarter of the sphere. We render 250 views for each object. To download our renderings (7.5GB), run:

gdown https://drive.google.com/uc?id=1w1Cs0yztH6kE3JIz7mdggvPGCwIKkVi2
unzip google_scanned_objects_renderings.zip

2. Evaluation datasets

├──data/
    ├──deepvoxels/
    ├──nerf_synthetic/
    ├──nerf_llff_data/

The evaluation datasets include DeepVoxel synthetic dataset, NeRF realistic 360 dataset and the real forward-facing dataset. To download all three datasets (6.7G), run the following command under data/ directory:

bash download_eval_data.sh

Evaluation

First download our pretrained model under the project root directory:

gdown https://drive.google.com/uc?id=165Et85R8YnL-5NcehG0fzqsnAUN8uxUJ
unzip pretrained_model.zip

You can use eval/eval.py to evaluate the pretrained model. For example, to obtain the PSNR, SSIM and LPIPS on the fern scene in the real forward-facing dataset, you can first specify your paths in configs/eval_llff.txt and then run:

cd eval/
python eval.py --config ../configs/eval_llff.txt

Rendering videos of smooth camera paths

You can use render_llff_video.py to render videos of smooth camera paths for the real forward-facing scenes. For example, you can first specify your paths in configs/eval_llff.txt and then run:

cd eval/
python render_llff_video.py --config ../configs/eval_llff.txt

You can also capture your own data of forward-facing scenes and synthesize novel views using our method. Please follow the instructions from LLFF on how to capture and process the images.

Training

We strongly recommend you to train the model with multiple GPUs:

# this example uses 8 GPUs (nproc_per_node=8) 
python -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/pretrain.txt

Alternatively, you can train with a single GPU by setting distributed=False in configs/pretrain.txt and running:

python train.py --config configs/pretrain.txt

Finetuning

To finetune on a specific scene, for example, fern, using the pretrained model, run:

# this example uses 2 GPUs (nproc_per_node=2) 
python -m torch.distributed.launch --nproc_per_node=2 train.py --config configs/finetune_llff.txt

Additional information

  • Our current implementation is not well-optimized in terms of the time efficiency at inference. Rendering a 1000x800 image can take from 30s to over a minute depending on specific GPU models. Please make sure to maximize the GPU memory utilization by increasing the size of the chunk to reduce inference time. You can also try to decrease the number of input source views (but subject to performance loss).
  • If you want to create and train on your own datasets, you can implement your own Dataset class following our examples in ibrnet/data_loaders/. You can verify the camera poses using data_verifier.py in ibrnet/data_loaders/.
  • Since the evaluation datasets are either object-centric or forward-facing scenes, our provided view selection methods are very simple (based on either viewpoints or camera locations). If you want to evaluate our method on new scenes with other kinds of camera distributions, you might need to implement your own view selection methods to identify the most effective source views.
  • If you have any questions, you can contact [email protected].

Citation

@inproceedings{wang2021ibrnet,
  author    = {Wang, Qianqian and Wang, Zhicheng and Genova, Kyle and Srinivasan, Pratul and Zhou, Howard  and Barron, Jonathan T. and Martin-Brualla, Ricardo and Snavely, Noah and Funkhouser, Thomas},
  title     = {IBRNet: Learning Multi-View Image-Based Rendering},
  booktitle = {CVPR},
  year      = {2021}
}

Owner
Google Interns
Google Interns
Lua-parser-lark - An out-of-box Lua parser written in Lark

An out-of-box Lua parser written in Lark Such parser handles a relaxed version o

Taine Zhao 2 Jul 19, 2022
Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

Reconstructing 3D Human Pose by Watching Humans in the Mirror Qi Fang*, Qing Shuai*, Junting Dong, Hujun Bao, Xiaowei Zhou CVPR 2021 Oral The videos a

ZJU3DV 178 Dec 13, 2022
Multi-tool reverse engineering collaboration solution.

CollaRE v0.3 Intorduction CollareRE is a tool for collaborative reverse engineering that aims to allow teams that do need to use more then one tool du

105 Nov 27, 2022
DFM: A Performance Baseline for Deep Feature Matching

DFM: A Performance Baseline for Deep Feature Matching Python (Pytorch) and Matlab (MatConvNet) implementations of our paper DFM: A Performance Baselin

143 Jan 02, 2023
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 04, 2023
ADOP: Approximate Differentiable One-Pixel Point Rendering

ADOP: Approximate Differentiable One-Pixel Point Rendering Abstract: We present a novel point-based, differentiable neural rendering pipeline for scen

Darius Rückert 1.9k Jan 06, 2023
Code for How To Create A Fully Automated AI Based Trading System With Python

AI Based Trading System This code works as a boilerplate for an AI based trading system with yfinance as data source and RobinHood or Alpaca as broker

Rubén 196 Jan 05, 2023
Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

PL-Marker Source code for Pack Together: Entity and Relation Extraction with Levitated Marker. Quick links Overview Setup Install Dependencies Data Pr

THUNLP 173 Dec 30, 2022
[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Self-paced Contrastive Learning (SpCL) The official repository for Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

Yixiao Ge 286 Dec 21, 2022
Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

OSCAR Project Page | Paper This repository contains the codebase used in OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Ma

NVIDIA Research Projects 74 Dec 22, 2022
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 04, 2023
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision Project | PDF | Poster Fangyu Li, N. Dinesh Reddy, X

25 Dec 21, 2022
U-Net Brain Tumor Segmentation

U-Net Brain Tumor Segmentation 🚀 :Feb 2019 the data processing implementation in this repo is not the fastest way (code need update, contribution is

Hao 448 Jan 02, 2023
Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

Gender-classification This is a ML model to classify Male and Females using some physical characterstics Data. Python Libraries like Pandas,Numpy and

Aryan raj 11 Oct 16, 2022
Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

DeepXML Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents Architectures and algorithms DeepXML supports

Extreme Classification 49 Nov 06, 2022
Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Knowledge Bridging for Empathetic Dialogue Generation This is the official implementation for paper Knowledge Bridging for Empathetic Dialogue Generat

Qintong Li 50 Dec 20, 2022
Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis.

ID-Unet: Iterative-view-synthesis(CVPR2021 Oral) Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis. Overvie

17 Aug 23, 2022
A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s

International Business Machines 138 Dec 19, 2022
FANet - Real-time Semantic Segmentation with Fast Attention

FANet Real-time Semantic Segmentation with Fast Attention Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sc

Ping Hu 42 Nov 30, 2022