Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

Last update: Jan 03, 2023

Overview

M4Depth

This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in

M4Depth: A motion-based approach for monocular depth estimation on video sequences

Michaël Fonder, Damien Ernst and Marc Van Droogenbroeck

arXiv pdf

Some samples produced by our method: the first line shows the RGB picture capured by the camera, the second the ground-truth depth map and the last one the results produced by our method.

If you find our work useful in your research please consider citing our paper:

@article{Fonder2021M4Depth,
  title     = {M4Depth: A motion-based approach for monocular depth estimation on video sequences},
  author    = {Michael Fonder and Damien Ernst and Marc Van Droogenbroeck},
  booktitle = {arXiv},
  month     = {May},
  year      = {2021}
}

If you use the Mid-Air dataset in your research, please consider citing the related paper:

@INPROCEEDINGS{Fonder2019MidAir,
  author    = {Michael Fonder and Marc Van Droogenbroeck},
  title     = {Mid-Air: A multi-modal dataset for extremely low altitude drone flights},
  booktitle = {Conference on Computer Vision and Pattern Recognition Workshop (CVPRW)},
  year      = {2019},
  month     = {June}
}

Dependencies

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install tensorflow-gpu=1.15 h5py pyquaternion numpy

Formatting data

Our code works with tensorflow protobuffer files data for training and testing therefore need to be encoded properly before being passed to the network.

Mid-Air dataset

To reproduce the results of our paper, you can use the Mid-Air dataset for training and testing our network. For this, you will first need to download the required data on your computer. The procedure to get them is the following:

Go on the download page of the Mid-Air dataset

Select the "Left RGB" and "Stereo Disparity" image types

Move to the end of the page and enter your email to get the download links (the volume of selected data should be equal to 316.5Go)

Follow the procedure given at the begining of the download page to download and extract the dataset

Once the dataset is downloaded you can generate the required protobuffer files by running the following script:

python3 midair-protobuf_generation.py --db_path path/to/midair-root --output_dir desired/protobuf-location --write

This script generates trajectory sequences with a length of 8 frames and automatically creates the train and test splits for Mid-Air in separated subdirectories.

Custom data

You can also train or test our newtork on your own data. You can generate your own protobuffer files by repurpusing our midair-protobuf_generation.py script. When creating your own protobuffer files, you should pay attention to two major parameters; All sequences should have the same length and each element of a sequence should come with the following data:

"image/color_i" : the binary data of the jpeg picture encoding the color data of the frame
"Image/depth_i" : the binary data of the 16-bit png file encoding the stereo disparity map
"data/omega_i" : a list of three float32 numbers corresponding to the angular rotation between two consecutive frames
"data/trans_i" : a list of three float32 numbers corresponding to the translation between two consecutive frames

The subscript i has to be replaced by the index of the data within the trajectory. Translations and rotations are expressed in the standard camera frame of refence axis system.

Training

You can launch a training or a finetuning (if the log_dir already exists) by exectuting the following command line:

python3 m4depth_pipeline.py --train_datadir=path/to/protobuf/dir --log_dir=path/to/logdir --dataset=midair --arch_depth=6 --db_seq_len=8 --seq_len=6 --num_batches=200000 -b=3 -g=1 --summary_interval_secs=900 --save_interval_secs=1800

If needed, other options are available for the training phase and are described in pipeline_options.py and in m4depth_options.py files. Please note that the code can run on multiple GPUs to speedup the training.

Testing/Evaluation

You can launch the evaluation of your test samples by exectuting the following command line:

python3 m4depth_pipeline.py --test_datadir=path/to/protobuf/dir --log_dir=path/to/logdir --dataset=midair --arch_depth=6 --db_seq_len=8 --seq_len=8 --b=3 -g=1

If needed, other options are available for the evaluation phase and are described in pipeline_options.py and in m4depth_options.py files.

Pretrained model

We provide pretrained weights for our model in the "trained_weights" directory. Testing or evaluating a dataset from these weight can be done by executing the following command line:

python3 m4depth_pipeline.py --test_datadir=path/to/protobuf/dir --log_dir=trained_weights/M4Depth-d6 --dataset=midair --arch_depth=6 --db_seq_len=8 --seq_len=8 --b=3 -g=1

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

Related tags

Overview

M4Depth

Dependencies

Formatting data

Mid-Air dataset

Custom data

Training

Testing/Evaluation

Pretrained model

Owner

Michaël Fonder

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Light-Head R-CNN

OneShot Learning-based hotword detection.

Deep Q-network learning to play flappybird.

Advances in Neural Information Processing Systems (NeurIPS), 2020.

VD-BERT: A Unified Vision and Dialog Transformer with BERT

End-to-End Referring Video Object Segmentation with Multimodal Transformers

face_recognization (FaceNet) + TFHE (HNP) + hand_face_detection (Mediapipe)

A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

Iowa Project - My second project done at General Assembly, focused on feature engineering and understanding Linear Regression as a concept

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Anonymize BLM Protest Images

Python library containing BART query generation and BERT-based Siamese models for neural retrieval.

Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"