A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Last update: Nov 17, 2022

Related tags

Deep Learning idn-solver

Overview

idn-solver

Paper | Project Page

This repository contains the code release of our ICCV 2021 paper:

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Wang Zhao*, Shaohui Liu*, Yi Wei, Hengkai Guo, Yong-Jin Liu

Installation

We recommend to use conda to setup a specified environment. Run

conda env create -f environment.yml

Test on a sequence

First download the pretrained model from here and put it under ./pretrain/ folder.

Prepare the sequence data with color images, camera poses (4x4 cam2world transformation) and intrinsics. The sequence data structure should be like:

sequence_name
  | color
      | 00000.jpg
  | pose
      | 00000.txt
  | K.txt

Run the following command to get the outputs:

python infer_folder.py --seq_dir /path/to/the/sequence/data --output_dir /path/to/save/outputs --config ./configs/test_folder.yaml

Tune the "reference gap" parameter to make sure there are sufficient overlaps and camera translations within an image pair. For ScanNet-like sequence, we recommend to use reference_gap of 20.

Test on ScanNet

Prepare ScanNet test split data

Download the ScanNet test split data from the official site and pre-process the data using:

python ./data/preprocess.py --data_dir /path/to/scannet/test/split/ --output_dir /path/to/save/pre-processed/scannet/test/data

This includes 1. resize the color images to 480x640 resolution 2. sample the data with interval of 20

Run evaluation

python eval_scannet.py --data_dir /path/to/processed/scannet/test/split/ --config ./configs/test_scannet.yaml

Train

Prepare ScanNet training data

We use the pre-processed ScanNet data from NAS, you could download the data using this link. The data structure is like:

scannet
  | scannet_nas
    | train
      | scene0000_00
          | color
            | 0000.jpg
          | pose
            | 0000.txt
          | depth
            | 0000.npy
          | intrinsic
          | normal
            | 0000_normal.npy
    | val
  | scans_test_sample (preprocessed ScanNet test split)

Run training

Modify the "dataset_path" variable with yours in the config yaml.

The network is trained with a two-stage strategy. The whole training process takes ~6 days with 4 Nvidia V100 GPUs.

python train.py ./configs/scannet_stage1.yaml
python train.py ./configs/scannet_stage2.yaml

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Zhao_2021_ICCV,
    author    = {Zhao, Wang and Liu, Shaohui and Wei, Yi and Guo, Hengkai and Liu, Yong-Jin},
    title     = {A Confidence-Based Iterative Solver of Depths and Surface Normals for Deep Multi-View Stereo},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {6168-6177}
}

Acknowledgement

This project heavily relies codes from NAS and we thank the authors for releasing their code.

We also thank Xiaoxiao Long for kindly helping with ScanNet evaluations.

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Related tags

Overview

idn-solver

Installation

Test on a sequence

Test on ScanNet

Prepare ScanNet test split data

Run evaluation

Train

Prepare ScanNet training data

Run training

Citation

Acknowledgement

Owner

zhaowang

This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

A simple image/video to Desmos graph converter run locally

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Implementation of Gans

A fast and easy to use, moddable, Python based Minecraft server!

AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Open source Python module for computer vision

Image data augmentation scheduler for albumentations transforms

It is a system used to detect bone fractures. using techniques deep learning and image processing

A Closer Look at Reference Learning for Fourier Phase Retrieval

Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

A 3D sparse LBM solver implemented using Taichi

code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Related tags

Overview

idn-solver

Installation

Test on a sequence

Test on ScanNet

Prepare ScanNet test split data

Run evaluation

Train

Prepare ScanNet training data

Run training

Citation

Acknowledgement

Owner

zhaowang

This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

A simple image/video to Desmos graph converter run locally

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Implementation of Gans

A fast and easy to use, moddable, Python based Minecraft server!

AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Open source Python module for computer vision

Image data augmentation scheduler for albumentations transforms

It is a system used to detect bone fractures. using techniques deep learning and image processing

A Closer Look at Reference Learning for Fourier Phase Retrieval

Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

A 3D sparse LBM solver implemented using Taichi

code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务