[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Last update: Dec 30, 2022

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

This is the official implementation for the method described in

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.

3DV 2021 (arXiv pdf)

Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2   # we use albumentations for faster image preprocessing

This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.

We recommend using a conda environment to avoid dependency conflicts.

Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320

On its first run either of these commands will download the MS_1024x320 pretrained model (272MB) into the models/ folder. We provide the following options for --model_name:

`--model_name`	Training modality	Resolution	Abs_Rel	Sq_Rel	$\delta<1.25$
`M_640x192`	Mono	640 x 192	0.105	0.769	0.892
`M_1024x320`	Mono	1024 x 320	0.102	0.734	0.898
`M_1280x384`	Mono	1280 x 384	0.102	0.715	0.900
`MS_640x192`	Mono + Stereo	640 x 192	0.102	0.752	0.894
`MS_1024x320`	Mono + Stereo	1024 x 320	0.096	0.694	0.908

KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Training

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320:

python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
#             encoder     resolution                                     
# DepthNet   resnet50      1024x320
# PoseNet    resnet18       640x192

Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the weights of a model named MS_1024x320:

python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen

Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality	Input size	`.npy` filesize	Eigen disparities
Mono	640 x 192	326M	Download 🔗
Mono	1024 x 320	871M	Download 🔗
Mono	1280 x 384	1.27G	Download 🔗
Mono + Stereo	640 x 192	326M	Download 🔗
Mono + Stereo	1024 x 320	871M	Download 🔗

References

Monodepth2 - https://github.com/nianticlabs/monodepth2

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Setup

Prediction for a single image

KITTI training data

Training

Finetuning a pretrained model

Other training options

KITTI evaluation

Precomputed results

References

Owner

Jiaxing Yan

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

A library for performing coverage guided fuzzing of neural networks

An Straight Dilated Network with Wavelet for image Deblurring

⚾🤖⚾ Automatic baseball pitching overlay in realtime

IRON Kaggle project done while doing IRONHACK Bootcamp where we had to analyze and use a Machine Learning Project to predict future sales

Keeper for Ricochet Protocol, implemented with Apache Airflow

A programming language written with python

[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

Simple implementation of Mobile-Former on Pytorch

Fast and accurate optimisation for registration with little learningconvexadam

PyTorch implementations for our SIGGRAPH 2021 paper: Editable Free-viewpoint Video Using a Layered Neural Representation.

Repository for open research on optimizers.

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

A fast and easy to use, moddable, Python based Minecraft server!

3rd place solution for the Weather4cast 2021 Stage 1 Challenge

simple demo codes for Learning to Teach with Dynamic Loss Functions

Example for AUAV 2022 with obstacle avoidance.