Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Overview

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

This repository contains the code for the following paper:

  • R. Hu, N. Ravi, A. Berg, D. Pathak, Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image. in ICCV, 2021. (PDF)
@inproceedings{hu2021worldsheet,
  title={Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image},
  author={Hu, Ronghang and Ravi, Nikhila and Berg, Alex and Pathak, Deepak},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Project Page: https://worldsheet.github.io/

Installation

Our Worldsheet implementation is based on MMF and PyTorch3D. This repository is adapted from the MMF repository (https://github.com/facebookresearch/mmf).

This code is designed to be run on GPU, CPU training/inference is not supported.

  1. Create a new conda environment:
conda create -n worldsheet python=3.8
conda activate worldsheet
  1. Download this repository or clone with Git, and then enter the root directory of the repository git clone https://github.com/facebookresearch/worldsheet.git && cd worldsheet

  2. Install the MMF dependencies: pip install -r requirements.txt

  3. Install PyTorch3D as follows (we used v0.2.5):

# Install using conda
conda install -c pytorch3d pytorch3d=0.2.5

# Or install from GitHub directly
git clone https://github.com/facebookresearch/pytorch3d.git && cd pytorch3d
git checkout v0.2.5
rm -rf build/ **/*.so
FORCE_CUDA=1 pip install -e .
cd ..

# or pip install from github 
pip install "git+https://github.com/facebookresearch/[email protected]"
  1. Extra dependencies
pip install scikit-image

Train and evaluate on the Matterport3D and Replica datasets

In this work, we use the same Matterport3D and Replica datasets as in SynSin, based on the Habitat environment. In our codebase and config files, these two datasets are referred to as synsin_habitat (synsin_mp3d and synsin_replica) (note that here the synsin_ prefix only refers to the datasets used in SynSin; the underlying model being trained and evaluated is our Worldsheet model, not SynSin).

Extract the image frames

In our project, we extract those training, validation, and test image frames and camera matrices using the SynSin codebase for direct comparisons with SynSin and other previous work.

Please install our modified SynSin codebase from synsin_for_data_and_eval branch of this repository to extract the Matterport3D and Replica image frames:

git clone https://github.com/facebookresearch/worldsheet.git -b synsin_for_data_and_eval synsin && cd synsin

and install habitat-sim and habitat-api as additional SynSin dependencies following the official SynSin installation instructions. For convenience, we provide the corresponding versions of habitat-sim and habitat-api for SynSin in habitat-sim-for-synsin and habitat-sim-for-synsin branches of this repository.

After installing the SynSin codebase from synsin_for_data_and_eval branch, set up Matterport3D and Replica datasets following the instructions in the SynSin codebase, and run the following to save the image frames to disk (you can change MP3D_SAVE_IMAGE_DIR to a location on your machine).

# this is where Matterport3D and Replica image frames will be extracted
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# clone the SynSin repo from `synsin_for_data_and_eval` branch
git clone https://github.com/facebookresearch/worldsheet.git -b synsin_for_data_and_eval ../synsin
cd ../synsin

# Matterport3D train
DEBUG="" python evaluation/dump_train_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/train \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 1000
# Matterport3D val
DEBUG="" python evaluation/dump_val_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/val \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200
# Matterport3D test
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/test \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200
# Replica test
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_replica/test \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10  --images_before_reset 200 \
     --dataset replica
# Matterport3D val with 20-degree angle change
DEBUG="" python evaluation/dump_val_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/val_jitter_angle20 \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10 --images_before_reset 200 \
     --render_ids 0 --jitter_quaternions_angle 20
# Matterport3D test with 20-degree angle change
DEBUG="" python evaluation/dump_test_to_mmf.py \
     --result_folder ${MP3D_SAVE_IMAGE_DIR}/synsin_mp3d/test_jitter_angle20 \
     --old_model modelcheckpoints/mp3d/synsin.pth \
     --batch_size 8 --num_workers 10 --images_before_reset 200 \
     --render_ids 0 --jitter_quaternions_angle 20

cd ../worldsheet  # assuming `synsin` repo and `worldsheet` repo are under the same parent directory

Training

Run the following to perform training and evaluation. In our experiments, we use a single machine with 4 NVIDIA V100-32GB GPUs.

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# train the scene mesh prediction in Worldsheet
./run_mp3d_and_replica/train_mp3d.sh mp3d_nodepth_perceptual_l1laplacian

# train the inpainter with frozen scene mesh prediction
./run_mp3d_and_replica/train_mp3d.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh

Pretrained models

Instead of performing the training above, one can also directly download the pretrained models via

./run_mp3d_and_replica/download_pretrained_models.sh

and run the evaluation below.

Evaluation

The evaluation scripts below will print the performance (PSNR, SSIM, Perc-Sim) on different test data.

Evaluate on the default test sets with the same camera changes as the training data (Table 1):

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# Matterport3D, without inpainter (Table 1 line 6)
./run_mp3d_and_replica/eval_mp3d_test_iter.sh mp3d_nodepth_perceptual_l1laplacian 40000

# Matterport3D, full model (Table 1 line 7)
./run_mp3d_and_replica/eval_mp3d_test_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

# Replica, full model (Table 1 line 7)
./run_mp3d_and_replica/eval_replica_test_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Evaluate the full model on 2X camera changes (Table 2):

# set to the same path as in image frame extraction above
export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

# Matterport3D, without inpainter (Table 2 line 4)
./run_mp3d_and_replica/eval_mp3d_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian 40000

# Matterport3D, full model (Table 2 line 5)
./run_mp3d_and_replica/eval_mp3d_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

# Replica, full model (Table 2 line 5)
./run_mp3d_and_replica/eval_replica_test_jitter_angle20_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Visualization

One can visualize the model predictions using the script run_mp3d_and_replica/visualize_mp3d_val_iter.sh to visualize the Matterport3D validation set (and this script can be modified to visualize other splits). For example, run the following to visualize the predictions from the full model:

export MP3D_SAVE_IMAGE_DIR=/checkpoint/ronghanghu/neural_rendering_datasets

./run_mp3d_and_replica/visualize_mp3d_val_iter.sh mp3d_nodepth_perceptual_l1laplacian_inpaintGonly_freezemesh 40000

Then, you can inspect the predictions using the notebook run_mp3d_and_replica/visualize_predictions.ipynb.

Train and evaluate on the RealEstate10K dataset

In this work, we use the same RealEstate10K dataset as in SynSin.

Setting up the RealEstate10K dataset

Please set up the dataset following the instructions in SynSin. The scripts below assumes this dataset has been downloaded to /checkpoint/ronghanghu/neural_rendering_datasets/realestate10K/RealEstate10K/frames/. You can modify its path in mmf/configs/datasets/synsin_realestate10k/defaults.yaml.

Training

Run the following to perform the training and evaluation. In our experiments, we use a single machine with 4 NVIDIA V100-32GB GPUs.

# train 33x33 mesh
./run_realestate10k/train.sh realestate10k_dscale2_lowerL1_200

# initialize 65x65 mesh from trained 33x33 mesh
python ./run_realestate10k/init_65x65_from_33x33.py \
    --input ./save/synsin_realestate10k/realestate10k_dscale2_lowerL1_200/models/model_50000.ckpt \
    --output ./save/synsin_realestate10k/realestate10k_dscale2_stride4ft_lowerL1_200/init.ckpt

# train 65x65 mesh
./run_realestate10k/train.sh realestate10k_dscale2_stride4ft_lowerL1_200

Pretrained models

Instead of performing the training above, one can also directly download the pretrained models via

./run_realestate10k/download_pretrained_models.sh

and run the evaluation below.

Evaluation

Note: as mentioned in the paper, following the evaluation protocol of SynSin on RealEstate10K, the best metrics of two separate predictions based on each view were reported for single-view methods. We follow this evaluation protocol for consistency with SynSin on RealEstate10K in Table 3. We also report averaged metrics over all predictions in the supplemental.

The script below evaluates the performance on RealEstate10K with averaged metrics over all predictions, as reported in the supplemental Table C.1:

# Evaluate 33x33 mesh (Supplemental Table C.1 line 6)
./run_realestate10k/eval_test_iter.sh realestate10k_dscale2_lowerL1_200 50000

# Evaluate 65x65 mesh (Supplemental Table C.1 line 7)
./run_realestate10k/eval_test_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

To evaluate with the SynSin protocol using the best metrics of two separate predictions as in Table 3, one needs to first save the predicted novel views as PNG files, and then use the SynSin codebase for evaluation. Please install our modified SynSin codebase from synsin_for_data_and_eval branch of this repository following the Matterport3D and Replica instructions above. Then evaluate as follows:

# Save prediction PNGs for 33x33 mesh
./run_realestate10k/write_pred_pngs_test_iter.sh realestate10k_dscale2_lowerL1_200 50000

# Save prediction PNGs for 65x65 mesh
./run_realestate10k/write_pred_pngs_test_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

cd ../synsin  # assuming `synsin` repo and `worldsheet` repo under the same directory

# Evaluate 33x33 mesh (Table 3 line 9)
python evaluation/evaluate_realestate10k_all.py \
    --take_every_other \
    --folder ../worldsheet/save/prediction_synsin_realestate10k/realestate10k_dscale2_lowerL1_200/50000/realestate10k_test

# Evaluate 65x65 mesh (Table 3 line 10)
python evaluation/evaluate_realestate10k_all.py \
    --take_every_other \
    --folder ../worldsheet/save/prediction_synsin_realestate10k/realestate10k_dscale2_stride4ft_lowerL1_200/50000/realestate10k_test

(The --take_every_other flag above performs best-of-two-prediction evaluation; without this flag, it should give the average-over-all-prediction results as in Supplemental Table C.1.)

Visualization

One can visualize the model's predictions using the script run_realestate10k/eval_val_iter.sh for the RealEstate10K validation set (run_realestate10k/visualize_test_iter.sh for the test set). For example, run the following to visualize the predictions from the 65x65 mesh:

./run_realestate10k/visualize_val_iter.sh realestate10k_dscale2_stride4ft_lowerL1_200 50000

Then, you can inspect the predictions using notebook run_realestate10k/visualize_predictions.ipynb.

We also provide a notebook for interactive predictions in run_realestate10k/make_interactive_videos.ipynb, where one can walk through the scene and generate a continuous video of the predicted novel views.

Wrapping sheets with external depth prediction

In Sec. 4.3 of the paper, we test the limits of wrapping a mesh sheet over a large variety of images. We provide a notebook for this analysis in external_depth/make_interactive_videos_with_midas_depth.ipynb, where one can interactively generate a continuous video of the predicted novel views.

The structure of Worldsheet codebase

Worldsheet is implemented as a MMF model. This codebase largely follows the structure of typical MMF models and datasets.

The Worldsheet model is defined under the MMF model name mesh_renderer in the following files:

  • model definition: mmf/models/mesh_renderer.py
  • mesh and rendering utilities, losses, and metrics: mmf/neural_rendering/
  • config base: mmf/configs/models/mesh_renderer/defaults.yaml

The experimental config files for the Matterport and Replica experiments are in the following files:

  • Habitat dataset definition: mmf/datasets/builders/synsin_habitat/
  • Habitat dataset config base: mmf/configs/datasets/synsin_habitat/defaults.yaml
  • experimental configs: projects/neural_rendering/configs/synsin_habitat/

The experimental config files for the RealEstate10K experiments are in the following files:

  • RealEstate10K dataset definition: mmf/datasets/builders/synsin_realestate10k/
  • RealEstate10K dataset config base: mmf/configs/datasets/synsin_realestate10k/defaults.yaml
  • experimental configs: projects/neural_rendering/configs/synsin_realestate10k/

Acknowledgements

This repository is modified from the MMF library from Facebook AI Research. A large part of the codebase has been modified from the pix2pixHD codebase. Our PSNR, SSIM, and Perc-Sim evaluation scripts are modified from the SynSin codebase and we also use SynSin for image frame extraction on Matterport3D and Replica. A part of our differentiable rendering implementation is built upon the Softmax Splatting codebase. All appropriate licenses are included in the files in which the code is used.

Licence

Worldsheet is released under the BSD License.

the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

EOW-Softmax This code is for the paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration". Accepted by ICCV21. Usage Commnd exa

Yezhen Wang 36 Dec 02, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 04, 2022
Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

Takahiro Maeda 12 Oct 10, 2022
Mahadi-Now - This Is Pakistani Just Now Login Tools

PAKISTANI JUST NOW LOGIN TOOLS Install apt update apt upgrade apt install python

MAHADI HASAN AFRIDI 19 Apr 06, 2022
The official repository for "Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds"

Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds The why Im

3 Mar 29, 2022
Official implementation of Sparse Transformer-based Action Recognition

STAR Official implementation of S parse T ransformer-based A ction R ecognition Dataset download NTU RGB+D 60 action recognition of 2D/3D skeleton fro

Chonghan_Lee 15 Nov 02, 2022
A MNIST-like fashion product database. Benchmark

Fashion-MNIST Table of Contents Why we made Fashion-MNIST Get the Data Usage Benchmark Visualization Contributing Contact Citing Fashion-MNIST License

Zalando Research 10.5k Jan 08, 2023
Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

Meta Research 621 Dec 31, 2022
Low Complexity Channel estimation with Neural Network Solutions

Interpolation-ResNet Invited paper for WSA 2021, called 'Low Complexity Channel estimation with Neural Network Solutions'. Low complexity residual con

Dianxin 10 Dec 10, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
Revisiting Temporal Alignment for Video Restoration

Revisiting Temporal Alignment for Video Restoration [arXiv] Kun Zhou, Wenbo Li, Liying Lu, Xiaoguang Han, Jiangbo Lu We provide our results at Google

52 Dec 25, 2022
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 02, 2022
Official implementation of "Motif-based Graph Self-Supervised Learning forMolecular Property Prediction"

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction Official Pytorch implementation of NeurIPS'21 paper "Motif-based Graph Se

zaixi 71 Dec 20, 2022
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 01, 2022
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

SML (ICCV 2021, Oral) : Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Standardi

SangHun 61 Dec 27, 2022
Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Implementation for the paper: Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao, Sumeet Ka

Nurendra Choudhary 8 Nov 15, 2022
Code release for "Making a Bird AI Expert Work for You and Me".

Making-a-Bird-AI-Expert-Work-for-You-and-Me Code release for "Making a Bird AI Expert Work for You and Me". arxiv (Coming soon...) Changelog 2021/12/6

PRIS-CV: Computer Vision Group 11 Dec 11, 2022
LineBoard - Python+React+MySQL-白板即時系統改善人群行為

LineBoard-白板即時系統改善人群行為 即時顯示實驗室的使用狀況,並遠端預約排隊,以此來改善人們的工作效率 程式架構 運作流程 使用者先至該實驗室網站預約

Bo-Jyun Huang 1 Feb 22, 2022
EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

EgonNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale Paper: EgoNN: Egocentric Neural Network for Point Cloud

19 Sep 20, 2022