Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Overview

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

This repository contains the code for the following paper:

  • R. Hu, N. Ravi, A. Berg, D. Pathak, Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image. in ICCV, 2021. (PDF)
@inproceedings{hu2021worldsheet,
  title={Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image},
  author={Hu, Ronghang and Ravi, Nikhila and Berg, Alex and Pathak, Deepak},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Project Page: https://worldsheet.github.io/

Installation

Our Worldsheet implementation is based on MMF and PyTorch3D. This repository is adapted from the MMF repository (https://github.com/facebookresearch/mmf).

This code is designed to be run on GPU, CPU training/inference is not supported.

  1. Create a new conda environment:
conda create -n worldsheet python=3.8
conda activate worldsheet
  1. Download this repository or clone with Git, and then enter the root directory of the repository git clone https://github.com/facebookresearch/worldsheet.git && cd worldsheet

  2. Install the MMF dependencies: pip install -r requirements.txt

  3. Install PyTorch3D as follows (we used v0.2.5):

# Install using conda
conda install -c pytorch3d pytorch3d=0.2.5

# Or install from GitHub directly
git clone https://github.com/facebookresearch/pytorch3d.git && cd pytorch3d
git checkout v0.2.5
rm -rf build/ **/*.so
FORCE_CUDA=1 pip install -e .
cd ..

# or pip install from github 
pip install "git+https://github.com/facebookresearch/[email protected]"
  1. Extra dependencies
pip install scikit-image

Train and evaluate on the Matterport3D and Replica datasets

In this work, we use the same Matterport3D and Replica datasets as in SynSin, based on the Habitat environment. In our codebase and config files, these two datasets are referred to as synsin_habitat (synsin_mp3d and synsin_replica) (note that here the synsin_ prefix only refers to the datasets used in SynSin; the underlying model being trained and evaluated is our Worldsheet model, not SynSin).

Extract the image frames

In our project, we extract those training, validation, and test image frames and camera matrices using the SynSin codebase for direct comparisons with SynSin and other previous work.

Please install our modified SynSin codebase from synsin_for_data_and_eval branch of this repository to extract the Matterport3D and Replica image frames:

git clone https://github.com/facebookresearch/worldsheet.git -b synsin_for_data_and_eval synsin && cd synsin

and install habitat-sim and habitat-api as additional SynSin dependencies following the official SynSin installation instructions. For convenience, we provide the corresponding versions of habitat-sim and habitat-api for SynSin in habitat-sim-for-synsin and habitat-sim-for-synsin branches of this repository.

After installing the SynSin codebase from synsin_for_data_and_eval branch, set up Matterport3D and Replica datasets following the instructions in the SynSin codebase, and run the following to save the image frames to disk (you can change MP3D_SAVE_IMAGE_DIR to a location on your machine).

# this is where Matterport3D and Replica image frames will be extracted
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# clone the SynSin repo from `synsin_for_data_and_eval` branch
git clone https://github.com/facebookresearch/worldsheet.git -b synsin_for_data_and_eval ../synsin
cd ../synsin

# Matterport3D train
DEBUG="" python evaluation/dump_train_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/train \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 1000
# Matterport3D val
DEBUG="" python evaluation/dump_val_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/val \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200
# Matterport3D test
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/test \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200
# Replica test
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_replica/test \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200 \
     --dataset replica
# Matterport3D val with 20-degree angle change
DEBUG="" python evaluation/dump_val_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/val_jitter_angle20 \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10 --images_before_reset 200 \
     --render_ids 0 --jitter_quaternions_angle 20
# Matterport3D test with 20-degree angle change
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/test_jitter_angle20 \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10 --images_before_reset 200 \
     --render_ids 0 --jitter_quaternions_angle 20

cd ../worldsheet  # assuming `synsin` repo and `worldsheet` repo are under the same parent directory

Training

Run the following to perform training and evaluation. In our experiments, we use a single machine with 4 NVIDIA V100-32GB GPUs.

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# train the scene mesh prediction in Worldsheet
./run_mp3d_and_replica/train_mp3d.sh mp3d_nodepth_perceptual_l1laplacian

# train the inpainter with frozen scene mesh prediction
./run_mp3d_and_replica/train_mp3d.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh

Pretrained models

Instead of performing the training above, one can also directly download the pretrained models via

./run_mp3d_and_replica/download_pretrained_models.sh

and run the evaluation below.

Evaluation

The evaluation scripts below will print the performance (PSNR, SSIM, Perc-Sim) on different test data.

Evaluate on the default test sets with the same camera changes as the training data (Table 1):

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# Matterport3D, without inpainter (Table 1 line 6)
./run_mp3d_and_replica/eval_mp3d_test_iter.sh mp3d_nodepth_perceptual_l1laplacian 40000

# Matterport3D, full model (Table 1 line 7)
./run_mp3d_and_replica/eval_mp3d_test_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

# Replica, full model (Table 1 line 7)
./run_mp3d_and_replica/eval_replica_test_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Evaluate the full model on 2X camera changes (Table 2):

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# Matterport3D, without inpainter (Table 2 line 4)
./run_mp3d_and_replica/eval_mp3d_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian 40000

# Matterport3D, full model (Table 2 line 5)
./run_mp3d_and_replica/eval_mp3d_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

# Replica, full model (Table 2 line 5)
./run_mp3d_and_replica/eval_replica_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Visualization

One can visualize the model predictions using the script run_mp3d_and_replica/visualize_mp3d_val_iter.sh to visualize the Matterport3D validation set (and this script can be modified to visualize other splits). For example, run the following to visualize the predictions from the full model:

export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

./run_mp3d_and_replica/visualize_mp3d_val_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Then, you can inspect the predictions using the notebook run_mp3d_and_replica/visualize_predictions.ipynb.

Train and evaluate on the RealEstate10K dataset

In this work, we use the same RealEstate10K dataset as in SynSin.

Setting up the RealEstate10K dataset

Please set up the dataset following the instructions in SynSin. The scripts below assumes this dataset has been downloaded to /checkpoint/ronghanghu/neural_rendering_datasets/realestate10K/RealEstate10K/frames/. You can modify its path in mmf/configs/datasets/synsin_realestate10k/defaults.yaml.

Training

Run the following to perform the training and evaluation. In our experiments, we use a single machine with 4 NVIDIA V100-32GB GPUs.

# train 33x33 mesh
./run_realestate10k/train.sh realestate10k_dscale2_lowerL1_200

# initialize 65x65 mesh from trained 33x33 mesh
python ./run_realestate10k/init_65x65_from_33x33.py \
    --input ./save/synsin_realestate10k/realestate10k_dscale2_lowerL1_200/models/model_50000.ckpt \
    --output ./save/synsin_realestate10k/realestate10k_dscale2_stride4ft_lowerL1_200/init.ckpt

# train 65x65 mesh
./run_realestate10k/train.sh realestate10k_dscale2_stride4ft_lowerL1_200

Pretrained models

Instead of performing the training above, one can also directly download the pretrained models via

./run_realestate10k/download_pretrained_models.sh

and run the evaluation below.

Evaluation

Note: as mentioned in the paper, following the evaluation protocol of SynSin on RealEstate10K, the best metrics of two separate predictions based on each view were reported for single-view methods. We follow this evaluation protocol for consistency with SynSin on RealEstate10K in Table 3. We also report averaged metrics over all predictions in the supplemental.

The script below evaluates the performance on RealEstate10K with averaged metrics over all predictions, as reported in the supplemental Table C.1:

# Evaluate 33x33 mesh (Supplemental Table C.1 line 6)
./run_realestate10k/eval_test_iter.sh realestate10k_dscale2_lowerL1_200 50000

# Evaluate 65x65 mesh (Supplemental Table C.1 line 7)
./run_realestate10k/eval_test_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

To evaluate with the SynSin protocol using the best metrics of two separate predictions as in Table 3, one needs to first save the predicted novel views as PNG files, and then use the SynSin codebase for evaluation. Please install our modified SynSin codebase from synsin_for_data_and_eval branch of this repository following the Matterport3D and Replica instructions above. Then evaluate as follows:

# Save prediction PNGs for 33x33 mesh
./run_realestate10k/write_pred_pngs_test_iter.sh realestate10k_dscale2_lowerL1_200 50000

# Save prediction PNGs for 65x65 mesh
./run_realestate10k/write_pred_pngs_test_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

cd ../synsin  # assuming `synsin` repo and `worldsheet` repo under the same directory

# Evaluate 33x33 mesh (Table 3 line 9)
python evaluation/evaluate_realestate10k_all.py \
    --take_every_other \
    --folder ../worldsheet/save/prediction_synsin_realestate10k/realestate10k_dscale2_lowerL1_200/50000/realestate10k_test

# Evaluate 65x65 mesh (Table 3 line 10)
python evaluation/evaluate_realestate10k_all.py \
    --take_every_other \
    --folder ../worldsheet/save/prediction_synsin_realestate10k/realestate10k_dscale2_stride4ft_lowerL1_200/50000/realestate10k_test

(The --take_every_other flag above performs best-of-two-prediction evaluation; without this flag, it should give the average-over-all-prediction results as in Supplemental Table C.1.)

Visualization

One can visualize the model's predictions using the script run_realestate10k/eval_val_iter.sh for the RealEstate10K validation set (run_realestate10k/visualize_test_iter.sh for the test set). For example, run the following to visualize the predictions from the 65x65 mesh:

./run_realestate10k/visualize_val_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

Then, you can inspect the predictions using notebook run_realestate10k/visualize_predictions.ipynb.

We also provide a notebook for interactive predictions in run_realestate10k/make_interactive_videos.ipynb, where one can walk through the scene and generate a continuous video of the predicted novel views.

Wrapping sheets with external depth prediction

In Sec. 4.3 of the paper, we test the limits of wrapping a mesh sheet over a large variety of images. We provide a notebook for this analysis in external_depth/make_interactive_videos_with_midas_depth.ipynb, where one can interactively generate a continuous video of the predicted novel views.

The structure of Worldsheet codebase

Worldsheet is implemented as a MMF model. This codebase largely follows the structure of typical MMF models and datasets.

The Worldsheet model is defined under the MMF model name mesh_renderer in the following files:

  • model definition: mmf/models/mesh_renderer.py
  • mesh and rendering utilities, losses, and metrics: mmf/neural_rendering/
  • config base: mmf/configs/models/mesh_renderer/defaults.yaml

The experimental config files for the Matterport and Replica experiments are in the following files:

  • Habitat dataset definition: mmf/datasets/builders/synsin_habitat/
  • Habitat dataset config base: mmf/configs/datasets/synsin_habitat/defaults.yaml
  • experimental configs: projects/neural_rendering/configs/synsin_habitat/

The experimental config files for the RealEstate10K experiments are in the following files:

  • RealEstate10K dataset definition: mmf/datasets/builders/synsin_realestate10k/
  • RealEstate10K dataset config base: mmf/configs/datasets/synsin_realestate10k/defaults.yaml
  • experimental configs: projects/neural_rendering/configs/synsin_realestate10k/

Acknowledgements

This repository is modified from the MMF library from Facebook AI Research. A large part of the codebase has been modified from the pix2pixHD codebase. Our PSNR, SSIM, and Perc-Sim evaluation scripts are modified from the SynSin codebase and we also use SynSin for image frame extraction on Matterport3D and Replica. A part of our differentiable rendering implementation is built upon the Softmax Splatting codebase. All appropriate licenses are included in the files in which the code is used.

Licence

Worldsheet is released under the BSD License.

App for identification of various objects. Based on YOLO v4 tiny architecture

Object_detection Repository containing trained model yolo v4 tiny, which is capable of identification 80 different classes Default feed is set to be a

Mateusz Kurdziel 0 Jun 22, 2022
Vehicles Counting using YOLOv4 + DeepSORT + Flask + Ngrok

A project for counting vehicles using YOLOv4 + DeepSORT + Flask + Ngrok

Duong Tran Thanh 37 Dec 16, 2022
Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm

GT-SALT 36 Dec 02, 2022
MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Introduction This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Ple

7 Aug 24, 2022
Code for our NeurIPS 2021 paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

GateL0RD This is a lightweight PyTorch implementation of GateL0RD, our RNN presented in "Sparsely Changing Latent States for Prediction and Planning i

Autonomous Learning Group 16 Nov 03, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 896 Jan 01, 2023
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
Federated_learning codes used for the the paper "Evaluation of Federated Learning Aggregation Algorithms" and "A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison"

Federated Distance (FedDist) This is the code accompanying the Percom2021 paper "A Federated Learning Aggregation Algorithm for Pervasive Computing: E

GETALP 8 Jan 03, 2023
A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Jun-Yan Zhu 27 Aug 08, 2022
Code for "Typilus: Neural Type Hints" PLDI 2020

Typilus A deep learning algorithm for predicting types in Python. Please find a preprint here. This repository contains its implementation (src/) and

47 Nov 08, 2022
CLNTM - Contrastive Learning for Neural Topic Model

Contrastive Learning for Neural Topic Model This repository contains the impleme

Thong Thanh Nguyen 25 Nov 24, 2022
Source code for Task-Aware Variational Adversarial Active Learning

Contrastive Coding for Active Learning under Class Distribution Mismatch Official PyTorch implementation of ["Contrastive Coding for Active Learning u

27 Nov 23, 2022
An Inverse Kinematics library aiming performance and modularity

IKPy Demo Live demos of what IKPy can do (click on the image below to see the video): Also, a presentation of IKPy: Presentation. Features With IKPy,

Pierre Manceron 481 Jan 02, 2023
Sample code from the Neural Networks from Scratch book.

Neural Networks from Scratch (NNFS) book code Code from the NNFS book (https://nnfs.io) separated by chapter.

Harrison 172 Dec 31, 2022
💡 Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

Gustavo Rosa 57 Nov 17, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
PyElastica is the Python implementation of Elastica, an open-source software for the simulation of assemblies of slender, one-dimensional structures using Cosserat Rod theory.

PyElastica PyElastica is the python implementation of Elastica: an open-source project for simulating assemblies of slender, one-dimensional structure

Gazzola Lab 105 Jan 09, 2023
A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

yolov5-fire-smoke-detect-python A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano You can see

20 Dec 15, 2022
使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,包含C++和Python两种版本的程序实现。本套程序只依赖opencv库就可以运行, 从而彻底摆脱对任何深度学习框架的依赖。

YOLOP-opencv-dnn 使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,依然是包含C++和Python两种版本的程序实现 onnx文件从百度云盘下载,链接:https://pan.baidu.com/s/1A_9cldU

178 Jan 07, 2023
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022