Turning pixels into virtual points for multimodal 3D object detection.

Related tags

Deep LearningMVP
Overview

Multimodal Virtual Point 3D Detection

Turning pixels into virtual points for multimodal 3D object detection.

Multimodal Virtual Point 3D Detection,
Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl,
arXiv technical report (arXiv 2111.06881 )

@article{yin2021multimodal,
  title={Multimodal Virtual Point 3D Detection},
  author={Yin, Tianwei and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
  journal={NeurIPS},
  year={2021},
}

Contact

Any questions or suggestions are welcome!

Tianwei Yin [email protected] Xingyi Zhou [email protected]

Abstract

Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point-cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches.

Main results

3D detection on nuScenes validation set

MAP ↑ NDS ↑
CenterPoint-Voxel 59.5 66.7
CenterPoint-Voxel + MVP 66.0 69.9
CenterPoint-Pillar 52.4 61.5
CenterPoint-Voxel + MVP 62.8 66.2

3D detection on nuScenes test set

MAP ↑ NDS ↑ PKL ↓
MVP 66.4 70.5 0.603

Use MVP

Installation

Please install CenterPoint and CenterNet2. Make sure to add a link to CenterNet2 folder in your python path. We will use CenterNet2 for 2D instance segmentation and CenterPoint for 3D detection.

Getting Started

Download nuscenes data and organise as follows

# For nuScenes Dataset         
└── NUSCENES_DATASET_ROOT
       ├── samples       <-- key frames
       ├── sweeps        <-- frames without annotation
       ├── maps          <-- unused
       ├── v1.0-trainval <-- metadata

Create a symlink to the dataset root in both CenterPoint and MVP's root directories.

mkdir data && cd data
ln -s DATA_ROOT nuScenes

Remember to change the DATA_ROOT to the actual path in your system.

Generate Virtual Points

Download the centernet2 model from here and place it in the root directory.

Use the following command in the current directory to generate virtual points for nuscenes training and validation sets. The points will be saved to data/nuScenes/samples or sweeps/LIDAR_TOP_VIRTUAL.

python virtual_gen.py --info_path data/nuScenes/infos_train_10sweeps_withvelo_filter_True.pkl  

You will need about 80GB space and the whole process will take 10 to 20 hours using a single GPU. You can also download the precomputed virtual points from here.

Create Data

Go to the CenterPoint's root directory and run

# nuScenes
python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10 --virtual True 

if you want to reproduce CenterPoint baseline's results, then also run the following command

# nuScenes
python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10 --virtual False 

In the end, the data and info files should be organized as follows

# For nuScenes Dataset 
└── CenterPoint
       └── data    
              └── nuScenes 
                     ├── maps          <-- unused
                     |── v1.0-trainval <-- metadata and annotations
                     |── infos_train_10sweeps_withvelo_filter_True.pkl <-- train annotations
                     |── infos_val_10sweeps_withvelo_filter_True.pkl <-- val annotations
                     |── dbinfos_train_10sweeps_withvelo_virtual.pkl <-- GT database info files
                     |── gt_database_10sweeps_withvelo_virtual <-- GT database 
                     |── samples       <-- key frames
                        |── LIDAR_TOP
                        |── LIDAR_TOP_VIRTUAL
                     └── sweeps       <-- frames without annotation
                        |── LIDAR_TOP
                        |── LIDAR_TOP_VIRTUAL

Train & Evaluate in Command Line

Go to CenterPoint's root directory and use the following command to start a distributed training using 4 GPUs. The models and logs will be saved to work_dirs/CONFIG_NAME

python -m torch.distributed.launch --nproc_per_node=4 ./tools/train.py CONFIG_PATH

For distributed testing with 4 gpus,

python -m torch.distributed.launch --nproc_per_node=4 ./tools/dist_test.py CONFIG_PATH --work_dir work_dirs/CONFIG_NAME --checkpoint work_dirs/CONFIG_NAME/latest.pth 

For testing with one gpu and see the inference time,

python ./tools/dist_test.py CONFIG_PATH --work_dir work_dirs/CONFIG_NAME --checkpoint work_dirs/CONFIG_NAME/latest.pth --speed_test 

MODEL ZOO

We experiment with VoxelNet and PointPillars architectures on nuScenes.

VoxelNet

Model Validation MAP Validation NDS Link
centerpoint_baseline 59.5 66.7 URL
Ours 66.0 69.9 URL

PointPillars

Model Validation MAP Validation NDS Link
centerpoint_baseline 52.4 61.5 URL
Ours 62.8 66.2 URL

Test set models and predictions will be updated soon.

License

MIT License.

Owner
Tianwei Yin
Tianwei Yin
Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Fine-grainedImageClassification Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification We trained model here: lin

ZhenchaoTang 14 Oct 21, 2022
Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

ZJUNLP 68 Dec 28, 2022
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
Denoising images with Fourier Ring Correlation loss

Denoising images with Fourier Ring Correlation loss The python code accompanies the working manuscript Image quality measurements and denoising using

2 Mar 12, 2022
Self Governing Neural Networks (SGNN): the Projection Layer

Self Governing Neural Networks (SGNN): the Projection Layer A SGNN's word projections preprocessing pipeline in scikit-learn In this notebook, we'll u

Guillaume Chevalier 22 Nov 06, 2022
Differentiable Quantum Chemistry (only Differentiable Density Functional Theory and Hartree Fock at the moment)

DQC: Differentiable Quantum Chemistry Differentiable quantum chemistry package. Currently only support differentiable density functional theory (DFT)

75 Dec 02, 2022
Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance.

Qualcomm Innovation Center 137 Jan 03, 2023
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro

Jiachen Sun 168 Dec 29, 2022
LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

Matija Teršek 39 Dec 28, 2022
Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

All about AI with Cheat-Sheets(+100 Cheat-sheets), Free Online Books, Courses, Videos and Lectures, Papers, Tutorials, Researchers, Websites, Datasets

Niraj Lunavat 1.2k Jan 01, 2023
Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.

RDC-SLAM This repository contains code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms. The system takes in

40 Nov 19, 2022
Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Microsoft365_devicePhish Abusing Microsoft 365 OAuth Authorization Flow for Phishing Attack This is a simple proof-of-concept script that allows an at

Alex 236 Dec 21, 2022
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 02, 2022
Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Seulki Park 70 Jan 03, 2023
MoveNet Single Pose on OpenVINO

MoveNet Single Pose tracking on OpenVINO Running Google MoveNet Single Pose models on OpenVINO. A convolutional neural network model that runs on RGB

35 Nov 11, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

Init Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger. 本项目基于 https://github.com/jaywalnut310/vits https://github.com/S

AmorTX 107 Dec 23, 2022
Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

Takahiro Maeda 12 Oct 10, 2022
Source code for our CVPR 2019 paper - PPGNet: Learning Point-Pair Graph for Line Segment Detection

PPGNet: Learning Point-Pair Graph for Line Segment Detection PyTorch implementation of our CVPR 2019 paper: PPGNet: Learning Point-Pair Graph for Line

SVIP Lab 170 Oct 25, 2022