Turning pixels into virtual points for multimodal 3D object detection.

Last update: Jan 08, 2023

Related tags

Overview

Multimodal Virtual Point 3D Detection

Turning pixels into virtual points for multimodal 3D object detection.

Multimodal Virtual Point 3D Detection,
Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl,
arXiv technical report (arXiv 2111.06881 )

@article{yin2021multimodal,
  title={Multimodal Virtual Point 3D Detection},
  author={Yin, Tianwei and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
  journal={NeurIPS},
  year={2021},
}

Contact

Any questions or suggestions are welcome!

Tianwei Yin [email protected] Xingyi Zhou [email protected]

Abstract

Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point-cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches.

Main results

3D detection on nuScenes validation set

	MAP ↑	NDS ↑
CenterPoint-Voxel	59.5	66.7
CenterPoint-Voxel + MVP	66.0	69.9
CenterPoint-Pillar	52.4	61.5
CenterPoint-Voxel + MVP	62.8	66.2

3D detection on nuScenes test set

	MAP ↑	NDS ↑	PKL ↓
MVP	66.4	70.5	0.603

Use MVP

Installation

Please install CenterPoint and CenterNet2. Make sure to add a link to CenterNet2 folder in your python path. We will use CenterNet2 for 2D instance segmentation and CenterPoint for 3D detection.

Getting Started

Download nuscenes data and organise as follows

# For nuScenes Dataset         
└── NUSCENES_DATASET_ROOT
       ├── samples       <-- key frames
       ├── sweeps        <-- frames without annotation
       ├── maps          <-- unused
       ├── v1.0-trainval <-- metadata

Create a symlink to the dataset root in both CenterPoint and MVP's root directories.

mkdir data && cd data
ln -s DATA_ROOT nuScenes

Remember to change the DATA_ROOT to the actual path in your system.

Generate Virtual Points

Download the centernet2 model from here and place it in the root directory.

Use the following command in the current directory to generate virtual points for nuscenes training and validation sets. The points will be saved to data/nuScenes/samples or sweeps/LIDAR_TOP_VIRTUAL.

python virtual_gen.py --info_path data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl

You will need about 80GB space and the whole process will take 10 to 20 hours using a single GPU. You can also download the precomputed virtual points from here.

Create Data

Go to the CenterPoint's root directory and run

# nuScenes
python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10 --virtual True

if you want to reproduce CenterPoint baseline's results, then also run the following command

# nuScenes
python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10 --virtual False

In the end, the data and info files should be organized as follows

# For nuScenes Dataset 
└── CenterPoint
       └── data    
              └── nuScenes 
                     ├── maps          <-- unused
                     |── v1.0-trainval <-- metadata and annotations
                     |── infos_train_10sweeps_withvelo_filter_True.pkl <-- train annotations
                     |── infos_val_10sweeps_withvelo_filter_True.pkl <-- val annotations
                     |── dbinfos_train_10sweeps_withvelo_virtual.pkl <-- GT database info files
                     |── gt_database_10sweeps_withvelo_virtual <-- GT database 
                     |── samples       <-- key frames
                        |── LIDAR_TOP
                        |── LIDAR_TOP_VIRTUAL
                     └── sweeps       <-- frames without annotation
                        |── LIDAR_TOP
                        |── LIDAR_TOP_VIRTUAL

Train & Evaluate in Command Line

Go to CenterPoint's root directory and use the following command to start a distributed training using 4 GPUs. The models and logs will be saved to work_dirs/CONFIG_NAME

python -m torch.distributed.launch --nproc_per_node=4 ./tools/train.py CONFIG_PATH

For distributed testing with 4 gpus,

python -m torch.distributed.launch --nproc_per_node=4 ./tools/dist_test.py CONFIG_PATH --work_dir work_dirs/CONFIG_NAME --checkpoint work_dirs/CONFIG_NAME/latest.pth

For testing with one gpu and see the inference time,

python ./tools/dist_test.py CONFIG_PATH --work_dir work_dirs/CONFIG_NAME --checkpoint work_dirs/CONFIG_NAME/latest.pth --speed_test

MODEL ZOO

We experiment with VoxelNet and PointPillars architectures on nuScenes.

VoxelNet

Model	Validation MAP	Validation NDS	Link
centerpoint_baseline	59.5	66.7	URL
Ours	66.0	69.9	URL

PointPillars

Model	Validation MAP	Validation NDS	Link
centerpoint_baseline	52.4	61.5	URL
Ours	62.8	66.2	URL

Test set models and predictions will be updated soon.

License

MIT License.

Turning pixels into virtual points for multimodal 3D object detection.

Related tags

Overview

Multimodal Virtual Point 3D Detection

Contact

Abstract

Main results

3D detection on nuScenes validation set

3D detection on nuScenes test set

Use MVP

Installation

Getting Started

Download nuscenes data and organise as follows

Generate Virtual Points

Create Data

Train & Evaluate in Command Line

MODEL ZOO

VoxelNet

PointPillars

License

Owner

Tianwei Yin

Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Using python and scikit-learn to make stock predictions

Capstone-Project-2 - A game program written in the Python language

Deep learning image registration library for PyTorch

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

DIVeR: Deterministic Integration for Volume Rendering

Vit-ImageClassification - Pytorch ViT for Image classification on the CIFAR10 dataset

PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

Out-of-Distribution Generalization of Chest X-ray Using Risk Extrapolation

K Closest Points and Maximum Clique Pruning for Efficient and Effective 3D Laser Scan Matching (To appear in RA-L 2022)

Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

This repository contains datasets and baselines for benchmarking Chinese text recognition.

Classify music genre from a 10 second sound stream using a Neural Network.

TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

AVD Quickstart Containerlab

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Official implementation of the ICLR 2021 paper

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队