[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Overview

EPCDepth

EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details are described in our paper:

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, Yangang Cai

ICCV 2021 (arxiv)

EPCDepth can produce the most accurate and sharpest result. In the last example, the depth of the person in the second red box should be greater than that of the road sign because the road sign obscures the person. Only our model accurately captures the cue of occlusion.

Setup

1. Recommended environment

  • PyTorch 1.1
  • Python 3.6

2. KITTI data

You can download the raw KITTI dataset (about 175GB) by running:

wget -i dataset/kitti_archives_to_download.txt -P <your kitti path>/
cd <your kitti path>
unzip "*.zip"

Then, we recommend that you converted the png images to jpeg with this command:

find <your kitti path>/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and by manually adjusting the suffix of the image from .jpg to .png in dataset/kitti_dataset.py. Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.

3. Prepare depth hint

Once you have downloaded the KITTI dataset as in the previous step, you need to prepare the depth hint by running:

python precompute_depth_hints.py --data_path <your kitti path>

the generated depth hint will be saved to <your kitti path>/depth_hints. You should also pay attention to the suffix of the image.

📊 Evaluation

1. Download models

Download our pretrained model and put it to <your model path>.

Pre-trained PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE δ < 1.25
model18_lr 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
d2 0.1 0.712 4.462 0.886
model18 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
d2 0.0920 0.655 4.268 0.898
model50 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901
d2 0.0905 0.629 4.187 0.900

Note: pt refers to pre-trained on ImageNet, and the results of low resolution are a bit different from the paper.

2. KITTI evaluation

This operation will save the estimated disparity map to <your disparity save path>. To recreate the results from our paper, run:

python main.py 
    --val --data_path <your kitti path> --resume <your model path>/model18.pth.tar 
    --use_full_scale --post_process --output_scale 0 --disps_path <your disparity save path>

The shape of saved disparities in numpy data format is (N, H, W).

3. NYUv2 evaluation

We validate the generalization ability on the NYU-Depth-V2 dataset using the mode trained on the KITTI dataset. Download the testing data nyu_test.tar.gz, and unzip it to <your nyuv2 testing date path>. All evaluation codes are in the nyuv2Testing folder. Run:

python nyuv2_testing.py 
    --data_path <your nyuv2 testing date path>
    --resume <your mode path>/model50.pth.tar --post_process
    --save_dir <your nyuv2 disparity save path>

By default, only the visualization results (png format) of the predicted disparity and ground-truth will be saved to <your nyuv2 disparity save path> on NYUv2 dataset.

📦 KITTI Results

You can download our precomputed disparity predictions from the following links:

Disparity PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE δ < 1.25
disps18_lr 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
disps18 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
disps50 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901

🖼 Visualization

To visualize the disparity map saved in the KITTI evaluation (or other disparities in numpy data format), run:

python main.py --vis --disps_path <your disparity save path>/disps50.npy

The visualized depth map will be saved to <your disparity save path>/disps_vis in png format.

Training

To train the model from scratch, run:

python main.py 
    --data_path <your kitti path> --model_dir <checkpoint save dir> 
    --logs_dir <tensorboard save dir> --pretrained --post_process 
    --use_depth_hint --use_spp_distillation --use_data_graft 
    --use_full_scale --gpu_ids 0

🔧 Suggestion

  1. The magnitude of performance improvement: Data Grafting > Full-Scale > Self-Distillation. We noticed that the performance improvement of self-distillation becomes insignificant when the model capacity is large. Therefore, it is potential to explore more accurate self-distillation label extraction methods and better self-distillation strategies in the future.
  2. According to our experimental experience, the convergence of the self-supervised monocular depth estimation model using a larger backbone network is relatively unstable. You can verify your innovations on the small backbone first, and then adjust the learning rate appropriately to train on the big backbone.
  3. We found that using a pure RSU encoder has better performance than the traditional Resnet encoder, but unfortunately there is no RSU encoder pre-trained on Imagenet. Therefore, we firmly believe that someone can pre-train the RSU encoder on Imagenet and replace the resnet encoder of this model to get huge performance improvement.

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{epcdepth,
    title = {Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation},
    author = {Peng, Rui and Wang, Ronggang and Lai, Yawen and Tang, Luyang and Cai, Yangang},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
    year = {2021}
}

👩‍ Acknowledgements

Our depth hint module refers to DepthHints, the NYUv2 pre-processing refers to P2Net, and the RSU block refers to U2Net.

Owner
Rui Peng
Rui Peng
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

YeongHyeon Park 9 Oct 25, 2022
CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

Google Research 80 Dec 25, 2022
SWA Object Detection

SWA Object Detection This project hosts the scripts for training SWA object detectors, as presented in our paper: @article{zhang2020swa, title={SWA

237 Nov 28, 2022
Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Diego Porres 185 Dec 24, 2022
Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction on how to make use of the S

RISHABH MISHRA 1 Feb 13, 2022
Robust & Reliable Route Recommendation on Road Networks

NeuroMLR: Robust & Reliable Route Recommendation on Road Networks This repository is the official implementation of NeuroMLR: Robust & Reliable Route

4 Dec 20, 2022
Implementation of a Transformer, but completely in Triton

Transformer in Triton (wip) Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repo

Phil Wang 152 Dec 22, 2022
Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

Continual Learning Through Synaptic Intelligence This repository contains code to reproduce the key findings of our path integral approach to prevent

Ganguli Lab 82 Nov 03, 2022
Finding Donors for CharityML

Finding-Donors-for-CharityML - Investigated factors that affect the likelihood of charity donations being made based on real census data.

Moamen Abdelkawy 1 Dec 30, 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning This repository contains the code for our ICCV 202

sangho.lee 28 Nov 08, 2022
Package for working with hypernetworks in PyTorch.

Package for working with hypernetworks in PyTorch.

Christian Henning 71 Jan 05, 2023
Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala,

Stefan Krastanov 1 Jan 17, 2022
SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

SPRING This is the repo for SPRING (Symmetric ParsIng aNd Generation), a novel approach to semantic parsing and generation, presented at AAAI 2021. Wi

Sapienza NLP group 98 Dec 21, 2022
Implementation of the state-of-the-art vision transformers with tensorflow

ViT Tensorflow This repository contains the tensorflow implementation of the state-of-the-art vision transformers (a category of computer vision model

Mohammadmahdi NouriBorji 2 Mar 16, 2022
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation (ICCV 2021)

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation Home | PyTorch BigGAN Discovery | TensorFlow ProGAN Regulariza

Yuxiang Wei 54 Dec 30, 2022
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Guidedog Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma GuideDog is an AI/ML-based mobile app designed to assist the lives of the visua

Kyuhee Jo 5 Nov 24, 2021
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: 作者在论文中提到使用真实的word去初始化prefix的操作(Initializing the prefix with activations of real words,significantly improves generation)。我在使用作者提供的

Andrew Zeng 4 Jul 11, 2022
Image Super-Resolution by Neural Texture Transfer

SRNTT: Image Super-Resolution by Neural Texture Transfer Tensorflow implementation of the paper Image Super-Resolution by Neural Texture Transfer acce

Zhifei Zhang 413 Nov 30, 2022
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear

Simon Blanke 422 Jan 04, 2023