Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Last update: Dec 22, 2022

Related tags

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

3D visualization of estimated depth and scene flow (overlayed with input image) from temporally consecutive images.
Trained on KITTI in a self-supervised manner, and tested on DAVIS.

This repository is the official PyTorch implementation of the paper:

   Self-Supervised Multi-Frame Monocular Scene Flow
   Junhwa Hur and Stefan Roth
   CVPR, 2021
   Arxiv

Contact: junhwa.hur[at]gmail.com

Installation

The code has been tested with Anaconda (Python 3.8), PyTorch 1.8.1 and CUDA 10.1 (Different Pytorch + CUDA version is also compatible).
Please run the provided conda environment setup file:

conda env create -f environment.yml
conda activate multi-mono-sf

(Optional) Using the CUDA implementation of the correlation layer accelerates training (~50% faster):

./install_correlation.sh

After installing it, turn on this flag --correlation_cuda_enabled=True in training/evaluation script files.

Dataset

Please download the following to datasets for the experiment:

KITTI Raw Data (synced+rectified data, please refer MonoDepth2 for downloading all data more conveniently.)
merge KITTI Scene Flow 2015 and Multi-view extension in the same folder.

To save space, we convert the KITTI Raw png images to jpeg, following the convention from MonoDepth:

find (data_folder)/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'

We also converted images in KITTI Scene Flow 2015 as well. Please convert the png images in image_2 and image_3 into jpg and save them into the seperate folder image_2_jpg and image_3_jpg.
To save space further, you can delete the velodyne point data in KITTI raw data as we don't need it.

Training and Inference

The scripts folder contains training/inference scripts.

For self-supervised training, you can simply run the following script files:

Script	Training	Dataset
`./train_selfsup.sh`	Self-supervised	KITTI Split

Fine-tuning is done with two stages: (i) first finding the stopping point using train/valid split, and then (ii) fune-tuning using all data with the found iteration steps.

Script	Training	Dataset
`./ft_1st_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015
`./ft_2nd_stage.sh`	Semi-supervised finetuning	KITTI raw + KITTI 2015

In the script files, please configure these following PATHs for experiments:

DATA_HOME : the directory where the training or test is located in your local system.
EXPERIMENTS_HOME : your own experiment directory where checkpoints and log files will be saved.

To test pretrained models, you can simply run the following script files:

Script	Training	Dataset
`./eval_selfsup_train.sh`	self-supervised	KITTI 2015 Train
`./eval_ft_test.sh`	fine-tuned	KITTI 2015 Test
`./eval_davis.sh`	self-supervised	DAVIS (one scene)
`./eval_davis_all.sh`	self-supervised	DAVIS (all scenes)

To save visuailization of outputs, please turn on --save_vis=True in the script.
To save output images for KITTI Scene Flow 2015 Benchmark submission, please turn on --save_out=True in the script.

Pretrained Models

The checkpoints folder contains the checkpoints of the pretrained models.

Acknowledgement

Please cite our paper if you use our source code.

@inproceedings{Hur:2021:SSM,  
  Author = {Junhwa Hur and Stefan Roth},  
  Booktitle = {CVPR},  
  Title = {Self-Supervised Multi-Frame Monocular Scene Flow},  
  Year = {2021}  
}

Portions of the source code (e.g., training pipeline, runtime, argument parser, and logger) are from Jochen Gast

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Related tags

Overview

Self-Supervised Multi-Frame Monocular Scene Flow

Installation

Dataset

Training and Inference

Pretrained Models

Acknowledgement

Owner

Visual Inference Lab @TU Darmstadt

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

Cookiecutter PyTorch Lightning

yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

Pytorch implementation of the paper Time-series Generative Adversarial Networks

An experimentation and research platform to investigate the interaction of automated agents in an abstract simulated network environments.

A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

An 16kHz implementation of HiFi-GAN for soft-vc.

Image Completion with Deep Learning in TensorFlow

Multi-angle c(q)uestion answering

A TensorFlow implementation of DeepMind's WaveNet paper

Problem-943.-ACMP - Problem 943. ACMP

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Official Pytorch implementation of the paper: "Locally Shifted Attention With Early Global Integration"

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Driller: augmenting AFL with symbolic execution!

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Fuwa-http - The http client implementation for the fuwa eco-system

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation