PyTorch DepthNet Training on Still Box dataset

Last update: Nov 21, 2022

Related tags

Overview

DepthNet training on Still Box

Project page

This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in your work, please cite us with the following bibtex :

@Article{isprs-annals-IV-2-W3-67-2017,
AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.},
TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}

End-to-end depth from motion with stabilized monocular videos

This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement.
Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m.

See our second paper for information about using this code on real videos with speed estimation

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

Click Below for video

DepthNet

DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image.

No information is given about movement direction
DepthNet is Fully Convolutional, which means it is completely robust to optical center fault
This network only works for pinhole-like pictures

Still Box

Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is impossible to infer from a single image, as shapes get all kinds of sizes and positions.

You can download it here
The dataset webpage also provides a tutorial on how to read the data

Training

Requirements

[sudo] pip3 install -r requirements.txt

If you want to log some outputs from the validation set with the --log-output option, you need openCV python bindings to convert depth to RGB with a rainbow colormap.

If you don't have opencv, grayscales will be logged

Usage

Best results can be obtained by training on still box 64 and then finetuned successively up to the resolution you target. Here are the parameters used for the paper (please note how learning rate and batch size are changed, training was done a single GTX 980Ti).

python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn

python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64

python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128

python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256

Note: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation

Pretrained networks

Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm.

Name	training set	Error (m)
`DepthNet_elu_bn_64.pth.tar`	64	4.65	Link
`DepthNet_elu_bn_128.pth.tar`	128	3.08	Link
`DepthNet_elu_bn_256.pth.tar`	256	2.29	Link
`DepthNet_elu_bn_512.pth.tar`	512	1.97	Link

All the networks have the same size and same structure.

Custom FOV and focal length

Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively

32px for 64x64 images
64px for 128x128 images
128px for 128x128 images
256px for 512x512 images

Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training.

If you need to use a custom focal length and FOV you can simply resize the pictures and crop them.

Say you have a picture of width w with an associated FOV fov. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match fov (cropping doesn't affect focal length in pixels), and then resize it to w. Note that DepthNet can take rectangular pictures as input.

cropped_w = w/tan(pi*fov/360)

we naturally recommend to do this operation offline, metadata from metadata.json won't need to be altered.

with pretrained DepthNet

If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh w and FOV fov we get with a network trained with a particular focal length f the following width to resize to:

resized_w = f/2*tan(pi*fov/360)

That way, you won't have to make a dedicated training or even download the still box dataset

/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet

Testing Inference

The run_inference.py lets you run an inference on a folder of images, and save the depth maps in different visualizations.

A simple still box scene of 512x512 pictures for testing can be downloaded here. Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above.

python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs

Visualise training

Training can be visualized via tensorboard by launching this command in another terminal

tensorboard --logdir=/path/to/DepthNet/Results

You can then access the board from any computer in the local network by accessing machine_ip:6006 from a web browser, just as a regular tensorboard server. More info here

PyTorch DepthNet Training on Still Box dataset

Related tags

Overview

DepthNet training on Still Box

Project page

DepthNet

Still Box

Training

Requirements

Usage

Pretrained networks

Custom FOV and focal length

with pretrained DepthNet

Testing Inference

Visualise training

Owner

Clément Pinard

CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)

Implementation of MA-Trace - a general-purpose multi-agent RL algorithm for cooperative environments.

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

A simple AI that will give you si ple task and this is made with python

A program that uses computer vision to detect hand gestures, used for controlling movie players.

DIP-football - A football video analyse system based on Yolov5, alphapose, Qt6

Plenoxels: Radiance Fields without Neural Networks

Lama-cleaner: Image inpainting tool powered by LaMa

High-quality implementations of standard and SOTA methods on a variety of tasks.

Code for `BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery`, Neurips 2021

The Video-based Accident Detection System built in Python

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

Official Repository for the paper "Improving Baselines in the Wild".

Python implementation of NARS (Non-Axiomatic-Reasoning-System)

A keras implementation of ENet (abandoned for the foreseeable future)