[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

Overview

MonoRUn

MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*, Zhong Gao, Lu Xiong. (*Corresponding author: Wei Tian.)

This repository is the PyTorch implementation for MonoRUn. The codes are based on MMDetection and MMDetection3D, although we use our own data formats. The PnP C++ codes are modified from PVNet.

demo

Installation

Please refer to INSTALL.md.

Data preparation

Download the official KITTI 3D object dataset, including left color images, calibration files and training labels.

Download the train/val/test image lists [Google Drive | Baidu Pan, password: cj4u]. For training with LiDAR supervision, download the preprocessed object coordinate maps [Google Drive | Baidu Pan, password: fp3h].

Extract the downloaded archives according to the following folder structure. It is recommended to symlink the dataset root to $MonoRUn_ROOT/data. If your folder structure is different, you may need to change the corresponding paths in config files.

$MonoRUn_ROOT
├── configs
├── monorun
├── tools
├── data
│   ├── kitti
│   │   ├── testing
│   │   │   ├── calib
│   │   │   ├── image_2
│   │   │   └── test_list.txt
│   │   └── training
│   │       ├── calib
│   │       ├── image_2
│   │       ├── label_2
│   │       ├── obj_crd
│   │       ├── mono3dsplit_train_list.txt
│   │       ├── mono3dsplit_val_list.txt
│   │       └── trainval_list.txt

Run the preparation script to generate image metas:

cd $MonoRUn_ROOT
python tools/prepare_kitti.py

Train

cd $MonoRUn_ROOT

To train without LiDAR supervision:

python train.py configs/kitti_multiclass.py --gpu-ids 0 1

where --gpu-ids 0 1 specifies the GPU IDs. In the paper we use two GPUs for distributed training. The number of GPUs affects the mini-batch size. You may change the samples_per_gpu option in the config file to vary the number of images per GPU. If you encounter out of memory issue, add the argument --seed 0 --deterministic to save GPU memory.

To train with LiDAR supervision:

python train.py configs/kitti_multiclass_lidar_supv.py --gpu-ids 0 1

To view other training options:

python train.py -h

By default, logs and checkpoints will be saved to $MonoRUn_ROOT/work_dirs. You can run TensorBoard to plot the logs:

tensorboard --logdir $MonoRUn_ROOT/work_dirs

The above configs use the 3712-image split for training and the other split for validating. If you want to train on the full training set (train-val), use the config files with _trainval postfix.

Test

You can download the pretrained models:

  • kitti_multiclass.pth [Google Drive | Baidu Pan, password: 6bih] trained on KITTI training split
  • kitti_multiclass_lidar_supv.pth [Google Drive | Baidu Pan, password: nmdb] trained on KITTI training split
  • kitti_multiclass_lidar_supv_trainval.pth [Google Drive | Baidu Pan, password: hg2r] trained on KITTI train-val

To test and evaluate on the validation set using config at $CONFIG_PATH and checkpoint at $CPT_PATH:

python test.py $CONFIG_PATH $CPT_PATH --val-set --gpu-ids 0

To test on the test set and save detection results to $RESULT_DIR:

python test.py $CONFIG_PATH $CPT_PATH --result-dir $RESULT_DIR --gpu-ids 0

You can append the argument --show-dir $SHOW_DIR to save visualized results.

To view other testing options:

python test.py -h

Note: the training and testing scripts in the root directory are wrappers for the original scripts taken from MMDetection, which can be found in $MonoRUn_ROOT/tools. For advanced usage, please refer to the official MMDetection docs.

Demo

We provide a demo script to perform inference on images in a directory and save the visualized results. Example:

python demo/infer_imgs.py $KITTI_RAW_DIR/2011_09_30/2011_09_30_drive_0027_sync/image_02/data configs/kitti_multiclass_lidar_supv_trainval.py checkpoints/kitti_multiclass_lidar_supv_trainval.pth --calib demo/calib.csv --show-dir show/2011_09_30_drive_0027

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{monorun2021, 
  author = {Hansheng Chen and Yuyao Huang and Wei Tian and Zhong Gao and Lu Xiong}, 
  title = {MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation}, 
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
  year = {2021}
}
Owner
同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)
同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)
A deep learning CNN model to identify and classify and check if a person is wearing a mask or not.

Face Mask Detection The Model is designed to check if any human is wearing a mask or not. Dataset Description The Dataset contains a total of 11,792 i

1 Mar 01, 2022
Try out deep learning models online on Google Colab

Try out deep learning models online on Google Colab

Erdene-Ochir Tuguldur 1.5k Dec 27, 2022
Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

LearningPatches | Webpage | Paper | Video Learning Manifold Patch-Based Representations of Man-Made Shapes Dmitriy Smirnov, Mikhail Bessmeltsev, Justi

Dima Smirnov 22 Nov 14, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 08, 2023
SGoLAM - Simultaneous Goal Localization and Mapping

SGoLAM - Simultaneous Goal Localization and Mapping PyTorch implementation of the MultiON runner-up entry, SGoLAM: Simultaneous Goal Localization and

10 Jan 05, 2023
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
Deep Learning Training Scripts With Python

Deep Learning Training Scripts DNN Frameworks Caffe PyTorch Tensorflow CNN Models VGG ResNet DenseNet Inception Language Modeling GatedCNN-LM Attentio

Multicore Computing Research Lab 16 Dec 15, 2022
Totally Versatile Miscellanea for Pytorch

Totally Versatile Miscellania for PyTorch Thomas Viehmann [email protected] Thi

Thomas Viehmann 428 Dec 28, 2022
OpenL3: Open-source deep audio and image embeddings

OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. Please refer to the documentation for detailed instructi

Music and Audio Research Laboratory - NYU 326 Jan 02, 2023
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 05, 2022
Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

Contour-guided Image Completion with Perceptual Grouping Authors Morteza Rezanejad*, Sidharth Gupta*, Chandra Gummaluru, Ryan Marten, John Wilder, Mic

Sid Gupta 6 Dec 27, 2022
HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment 55 Nov 23, 2022
This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool

OpenSurfaces Segmentation UI This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool.

Sean Bell 66 Jul 11, 2022
Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors

Gas detection Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors. Description The MQ-2 sensor can detect multiple gases (CO, H2, CH4, LPG,

Filip Š 15 Sep 30, 2022
Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling".

PSSL Source code of CIKM2021 Long Paper "PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling". It consists of the pre-tra

2 Dec 21, 2021
Pose estimation for iOS and android using TensorFlow 2.0

💃 Mobile 2D Single Person (Or Your Own Object) Pose Estimation for TensorFlow 2.0 This repository is forked from edvardHua/PoseEstimationForMobile wh

tucan9389 165 Nov 16, 2022
Implementation of Basic Machine Learning Algorithms on small datasets using Scikit Learn.

Basic Machine Learning Algorithms All the basic Machine Learning Algorithms are implemented in Python using libraries Acknowledgements Machine Learnin

Piyal Banik 47 Oct 16, 2022
Multitask Learning Strengthens Adversarial Robustness

Multitask Learning Strengthens Adversarial Robustness

Columbia University 15 Jun 10, 2022
Tidy interface to polars

tidypolars tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to

Mark Fairbanks 144 Jan 08, 2023