Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Overview

Realtime Unsupervised Depth Estimation from an Image

This is the caffe implementation of our paper "Unsupervised CNN for single view depth estimation: Geometry to the rescue" published in ECCV 2016 with minor modifications. In this variant, we train the network end-to-end instead of in coarse to fine manner with deeper network (Resnet 50) and TVL1 loss instead of HS loss.

With the implementation we share the sample Resnet50by2 model trained on KITTI training set:

https://github.com/Ravi-Garg/Unsupervised_Depth_Estimation/blob/master/model/train_iter_40000.caffemodel

Shared model is a small variant of the 50 layer residual network from scratch on KITTI. Our model is <25 MB and predicts depths on 160x608 resolution images at over 30Hz on Nvidia Geforce GTX980 (50Hz on TITAN X). It can be used with caffe without any modification and we provide a simple matlab wrapper for testing.

Click on the image to watch preview of the results on youtube:

Screenshot

If you use our model or the code for your research please cite:

@inproceedings{garg2016unsupervised,
  title={Unsupervised CNN for single view depth estimation: Geometry to the rescue},
  author={Garg, Ravi and Kumar, BG Vijay and Carneiro, Gustavo and Reid, Ian},
  booktitle={European Conference on Computer Vision},
  pages={740--756},
  year={2016},
  organization={Springer}
}

Training Procedure

This model was trained on 23200 raw stereo pairs of KITTI taken from city, residential and road sequences. Images from other sequences of KITTI were left untouched. A subset of 697 images from 28 sequences froms the testset, leaving the remaining 33 sequences from these categories which can be used for training.

To use the same training data use the splits spacified in the file 'train_test_split.mat'.

Our model is trained end-to-end from scratch with adam solver (momentum1 = 0.9 , momentom2 = 0.999, learning rate =10e-3 ) for 40,000 iterations on 4 gpus with batchsize 14 per GPU. This model is a pre-release further tuning of hyperparameters should improve results. Only left-right flips as described in the paper were used to train the provided network. Other agumentations described in the paper and runtime shuffle were not used but should also lead to performance imrovement.

Here is the training loss recorded per 20 iterations:

loss per 20 iterations

Note: We have resized the KITTI images to 160x608 for training - which changes the aspect ratio of the images. Thus for proper evaluation on KITTI the images needs to be resized to this resolution and predicted disparities should be scaled by a factor of 608/width_of_input_image before computing depth. For ease in citing the results for further publications, we share the performance measures.

Our model gives following results on KITTI test-set without any post processing:

RMSE(linear): 4.400866

RMSE(log) : 0.233548

RMSE(log10) : 0.101441

Abs rel diff: 0.137796

Sq rel diff : 0.824861

accuracy THr 1.25 : 0.809765

accuracy THr 1.25 sq: 0.935108

accuracy THr 1.25 cube: 0.974739


The test-set consists of 697 images which was used in https://www.cs.nyu.edu/~deigen/depth/kitti_depth_predictions.mat Depth Predictions were first clipped to depth values between 0 and 50 meters and evaluated only in the region spacified in the given mask.

#Network Architecture

Architecture of our networks closely follow Residual networks scheme. We start from resnet 50 by 2 architecture and have replaced strided convolutions with 2x2 MAX pooling layers like VGG. The first 7x7 convolution with stride 2 is replaced with the 7x7 convolution with no stride and the max-pooled output at ½ resolution is passed through an extra 3x3 convolutional (128 features)->relu->2x2 pooling block. Rest of the network followes resnet50 with half the parameters every layer.

For dense prediction we have followed the skip-connections as specified in FCN and our ECCV paper. We have introduced a learnable scale layer with weight decay 0.01 before every 1x1 convolution of FCN skip-connections which allows us to merge mid-level features more efficiently by:

  • Adaptively selecting the mid-level features which are more correlated to depth of the scene.
  • Making 1x1 convolutions for projections more stable for end to end training.

Further analysis and visualizations of learned features will be released shortly on the arxiv: https://arxiv.org/pdf/1603.04992v2.pdf

Using the code

To train and finetune networks on your own data, you need to compile caffe with additional:

  • “AbsLoss” layer for L1 loss minimization,

  • “Warping” layer for image warpping given flow

  • and modified "filler.hpp" to compute image gradient with convolutions which we share here.

License

For academic usage, the code is released under the permissive BSD license. For any commercial purpose, please contact the authors.

Contact

Please report any known issues on this thread of to [email protected]

Owner
Ravi Garg
Ravi Garg
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 08, 2023
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
Data and extra materials for the food safety publications classifier

Data and extra materials for the food safety publications classifier The subdirectories contain detailed descriptions of their contents in the README.

1 Jan 20, 2022
Learning Synthetic Environments and Reward Networks for Reinforcement Learning

Learning Synthetic Environments and Reward Networks for Reinforcement Learning We explore meta-learning agent-agnostic neural Synthetic Environments (

AutoML-Freiburg-Hannover 16 Sep 02, 2022
NeWT: Natural World Tasks

NeWT: Natural World Tasks This repository contains resources for working with the NeWT dataset. ❗ At this time the binary tasks are not publicly avail

Visipedia 26 Oct 18, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

Erick Cobos 73 Dec 04, 2022
This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

GBW This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices. Ap

Andi Han 0 Oct 22, 2021
My take on a practical implementation of Linformer for Pytorch.

Linformer Pytorch Implementation A practical implementation of the Linformer paper. This is attention with only linear complexity in n, allowing for v

Peter 349 Dec 25, 2022
The source code of CVPR17 'Generative Face Completion'.

GenerativeFaceCompletion Matcaffe implementation of our CVPR17 paper on face completion. In each panel from left to right: original face, masked input

Yijun Li 313 Oct 18, 2022
A Number Recognition algorithm

Paddle-VisualAttention Results_Compared SVHN Dataset Methods Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Ac

1 Nov 12, 2021
Original Implementation of Prompt Tuning from Lester, et al, 2021

Prompt Tuning This is the code to reproduce the experiments from the EMNLP 2021 paper "The Power of Scale for Parameter-Efficient Prompt Tuning" (Lest

Google Research 282 Dec 28, 2022
Implementation of the Remixer Block from the Remixer paper, in Pytorch

Remixer - Pytorch Implementation of the Remixer Block from the Remixer paper, in Pytorch. It claims that substituting the feedforwards in transformers

Phil Wang 35 Aug 23, 2022
Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Few-shot Image Generation via Cross-domain Correspondence Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zh

Utkarsh Ojha 251 Dec 11, 2022
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Chris Rockwell 95 Nov 22, 2022
A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

BaSiC Matlab code accompanying A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images by Tingying Peng, Kurt Thorn, Timm Schr

Marr Lab 34 Dec 18, 2022
Minimal implementation and experiments of "No-Transaction Band Network: A Neural Network Architecture for Efficient Deep Hedging".

No-Transaction Band Network: A Neural Network Architecture for Efficient Deep Hedging Minimal implementation and experiments of "No-Transaction Band N

19 Jan 03, 2023
Rust bindings for the C++ api of PyTorch.

tch-rs Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorc

Laurent Mazare 2.3k Dec 30, 2022
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the momen

ChemEngAI 40 Dec 27, 2022
Improving adversarial robustness by a coupling rejection strategy

Adversarial Training with Rectified Rejection The code for the paper Adversarial Training with Rectified Rejection. Environment settings and libraries

Tianyu Pang 29 Jan 06, 2023