Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Related tags

Deep LearningVoxSeT
Overview

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper]

Authors: Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang.

This project is built on OpenPCDet.

Updates

2022-04-09: Add waymo config and multi-frame input.

The performance of VoxSeT (single-stage, single-frame) on Waymo valdation split are as follows.

% Training Car AP/APH Ped AP/APH Cyc AP/APH Log file
Level 1 20% 72.10/71.59 77.94/69.58 69.88/68.54 Download
Level 2 20% 63.62/63.17 70.20/62.51 67.31/66.02
Level 1 100% 74.50/74.03 80.03/72.42 71.56/70.29 Download
Level 2 100% 65.99/65.56 72.45/65.39 68.95/67.73

Introduction

drawing

Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention fields. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones.

1. Recommended Environment

  • Linux (tested on Ubuntu 16.04)
  • Python 3.7
  • PyTorch 1.9 or higher (tested on PyTorch 1.10.1)
  • CUDA 9.0 or higher (tested on CUDA 10.2)

2. Set the Environment

pip install -r requirement.txt
python setup.py build_ext --inplace 

The torch_scatter package is required

3. Data Preparation

# Download KITTI and organize it into the following form:
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2

# Generatedata infos:
python -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

4. Pretrain model

You can download the pretrain model here and the log file here.

The performance (using 11 recall poisitions) on KITTI validation set is as follows:

Car  [email protected], 0.70, 0.70:
bev  AP:90.1572, 88.0972, 86.8397
3d   AP:88.8694, 78.7660, 77.5758

Pedestrian [email protected], 0.50, 0.50:
bev  AP:63.1125, 58.5591, 55.1318
3d   AP:60.2515, 55.5535, 50.1888

Cyclist [email protected], 0.50, 0.50:
bev  AP:85.6768, 71.9008, 67.1551
3d   AP:85.4238, 70.2774, 64.9804

The runtime is about 33 ms per sample.

5. Train

  • Train with a single GPU
python train.py --cfg_file tools/cfgs/kitti_models/voxset.yaml
  • Train with multiple GPUs
cd VoxSeT/tools
bash scripts/dist_train.sh --cfg_file ./cfgs/kitti_models/voxset.yaml

6. Test with a pretrained model

cd VoxSeT/tools
python test.py --cfg_file --cfg_file ./cfgs/kitti_models/voxset.yaml --ckpt ${CKPT_FILE}

Citation

@inproceedings{he2022voxset,
  title={Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds},
  author={Chenhang He, Ruihuang Li, Shuai Li and Lei Zhang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}
Owner
Billy HE
PhD candidate of The Hong Kong Polytechnic University
Billy HE
Lazy, a tool for running things in idle time

Lazy, a tool for running things in idle time Mostly used to stop CUDA ML model training from making my desktop unusable. Simply monitors keyboard/mous

N Shepperd 46 Nov 06, 2022
Course content and resources for the AIAIART course.

AIAIART course This repo will house the notebooks used for the AIAIART course. Part 1 (first four lessons) ran via Discord in September/October 2021.

Jonathan Whitaker 492 Jan 06, 2023
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

4.8k Jan 07, 2023
Training a deep learning model on the noisy CIFAR dataset

Training-a-deep-learning-model-on-the-noisy-CIFAR-dataset This repository contai

1 Jun 14, 2022
Pairwise learning neural link prediction for ogb link prediction

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

Zhitao WANG 31 Oct 10, 2022
Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th

Thang Vu 15 Dec 02, 2022
UV matrix decompostion using movielens dataset

UV-matrix-decompostion-with-kfold UV matrix decompostion using movielens dataset upload the 'ratings.dat' file install the following python libraries

2 Oct 18, 2022
Deepfake Scanner by Deepware.

Deepware Scanner (CLI) This repository contains the command-line deepfake scanner tool with the pre-trained models that are currently used at deepware

deepware 110 Jan 02, 2023
Open source person re-identification library in python

Open-ReID Open-ReID is a lightweight library of person re-identification for research purpose. It aims to provide a uniform interface for different da

Tong Xiao 1.3k Jan 01, 2023
A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

George Stein 290 Dec 29, 2022
DeepLearning Anomalies Detection with Bluetooth Sensor Data

Final Year Project. Constructing models to create offline anomalies detection using Travel Time Data collected from Bluetooth sensors along the route.

1 Jan 10, 2022
A Python 3 package for state-of-the-art statistical dimension reduction methods

direpack: a Python 3 library for state-of-the-art statistical dimension reduction techniques This package delivers a scikit-learn compatible Python 3

Sven Serneels 32 Dec 14, 2022
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th

5 Jan 25, 2022
Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
Proof of concept GnuCash Webinterface

Proof of Concept GnuCash Webinterface This may one day be a something truly great. Milestones [ ] Browse accounts and view transactions [ ] Record sim

Josh 14 Dec 28, 2022
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Dataset Distillation by Matching Training Trajectories Project Page | Paper This repo contains code for training expert trajectories and distilling sy

George Cazenavette 256 Jan 05, 2023
atmaCup #11 の Public 4th / Pricvate 5th Solution のリポジトリです。

#11 atmaCup 2021-07-09 ~ 2020-07-21 に行われた #11 [初心者歓迎! / 画像編] atmaCup のリポジトリです。結果は Public 4th / Private 5th でした。 フレームワークは PyTorch で、実装は pytorch-image-m

Tawara 12 Apr 07, 2022
3D mesh stylization driven by a text input in PyTorch

Text2Mesh [Project Page] Text2Mesh is a method for text-driven stylization of a 3D mesh, as described in "Text2Mesh: Text-Driven Neural Stylization fo

Threedle (University of Chicago) 649 Dec 27, 2022
Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions

Natural Posterior Network This repository provides the official implementation o

Oliver Borchert 54 Dec 06, 2022
RoboDesk A Multi-Task Reinforcement Learning Benchmark

RoboDesk A Multi-Task Reinforcement Learning Benchmark If you find this open source release useful, please reference in your paper: @misc{kannan2021ro

Google Research 66 Oct 07, 2022