Official source code of Fast Point Transformer, CVPR 2022

Last update: Dec 23, 2022

Overview

Fast Point Transformer

Project Page | Paper

This repository contains the official source code and data for our paper:

Fast Point Transformer
Chunghyun Park, Yoonwoo Jeong, Minsu Cho, and Jaesik Park
POSTECH GSAI & CSE
CVPR, 2022, New Orleans.

Overview

This work introduces Fast Point Transformer that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel based method, and our network achieves 129 times faster inference time than the state-of-the-art, Point Transformer, with a reasonable accuracy trade-off in 3D semantic segmentation on S3DIS dataset.

Citation

If you find our code or paper useful, please consider citing our paper:

@inproceedings{park2022fast,
 title={{Fast Point Transformer}},
 author={Chunghyun Park and Yoonwoo Jeong and Minsu Cho and Jaesik Park},
 booktitle={Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
 year={2022}
}

Experiments

1. S3DIS Area 5 test

We denote MinkowskiNet42 trained with this repository as MinkowskiNet42^†. We use voxel size 4cm for both MinkowskiNet42^† and our Fast Point Transformer.

Model	Latency (sec)	mAcc (%)	mIoU (%)	Reference
PointTransformer	18.07	76.5	70.4	Codes from the authors
MinkowskiNet42^†	0.08	74.1	67.2	Checkpoint
+ rotation average	0.66	75.1	69.0	-
FastPointTransformer	0.14	76.6	69.2	Checkpoint
+ rotation average	1.13	77.6	71.0	-

2. ScanNetV2 validation

Model	Voxel Size	mAcc (%)	mIoU (%)	Reference
MinkowskiNet42	2cm	-	72.2	Official GitHub
MinkowskiNet42^†	2cm	81.4	72.1	Checkpoint
FastPointTransformer	2cm	81.2	72.5	Checkpoint
MinkowskiNet42^†	5cm	76.3	67.0	Checkpoint
FastPointTransformer	5cm	78.9	70.0	Checkpoint
MinkowskiNet42^†	10cm	70.8	60.7	Checkpoint
FastPointTransformer	10cm	76.1	66.5	Checkpoint

Installation

This repository is developed and tested on

Ubuntu 18.04 and 20.04
Conda 4.11.0
CUDA 11.1
Python 3.8.13
PyTorch 1.7.1 and 1.10.0
MinkowskiEngine 0.5.4

Environment Setup

You can install the environment by using the provided shell script:

~$ git clone --recursive [email protected]:POSTECH-CVLab/FastPointTransformer.git
~$ cd FastPointTransformer
~/FastPointTransformer$ bash setup.sh fpt
~/FastPointTransformer$ conda activate fpt

Training & Evaluation

First of all, you need to download the datasets (ScanNetV2 and S3DIS), and preprocess them as:

(fpt) ~/FastPointTransformer$ python src/data/preprocess_scannet.py # you need to modify the data path
(fpt) ~/FastPointTransformer$ python src/data/preprocess_s3dis.py # you need to modify the data path

And then, locate the provided meta data of each dataset (src/data/meta_data) with the preprocessed dataset following the structure below:

${data_dir}
├── scannetv2
│   ├── meta_data
│   │   ├── scannetv2_train.txt
│   │   ├── scannetv2_val.txt
│   │   └── ...
│   └── scannet_processed
│       ├── train
│       │   ├── scene0000_00.ply
│       │   ├── scene0000_01.ply
│       │   └── ...
│       └── test
└── s3dis
    ├── meta_data
    │   ├── area1.txt
    │   ├── area2.txt
    │   └── ...
    └── s3dis_processed
        ├── Area_1
        │   ├── conferenceRoom_1.ply
        │   ├── conferenceRoom_2.ply
        │   └── ...
        ├── Area_2
        └── ...

After then, you can train and evalaute a model by using the provided python scripts (train.py and eval.py) with configuration files in the config directory. For example, you can train and evaluate Fast Point Transformer with voxel size 4cm on S3DIS dataset via the following commands:

(fpt) ~/FastPointTransformer$ python train.py config/s3dis/train_fpt.gin
(fpt) ~/FastPointTransformer$ python eval.py config/s3dis/eval_fpt.gin {checkpoint_file} # use -r option for rotation averaging.

Consistency Score

You need to generate predictions via the following command:

(fpt) ~/FastPointTransformer$ python -m src.cscore.prepare {checkpoint_file} -m {model_name} -v {voxel_size} # This takes hours.

Then, you can calculate the consistency score (CScore) with:

(fpt) ~/FastPointTransformer$ python -m src.cscore.calculate {prediction_dir} # This takes seconds.

3D Object Detection using VoteNet

Please refer this repository.

Acknowledgement

Our code is based on the MinkowskiEngine. We also thank Hengshuang Zhao for providing the code of Point Transformer. If you use our model, please consider citing them as well.

Official source code of Fast Point Transformer, CVPR 2022

Related tags

Overview

Fast Point Transformer

Project Page | Paper

Overview

Citation

Experiments

1. S3DIS Area 5 test

2. ScanNetV2 validation

Installation

Environment Setup

Training & Evaluation

Consistency Score

3D Object Detection using VoteNet

Acknowledgement

Owner

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Rust bindings for the C++ api of PyTorch.

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

my graduation project is about live human face augmentation by projection mapping by using CNN

NeRF Meta-Learning with PyTorch

Free course that takes you from zero to Reinforcement Learning PRO 🦸🏻‍🦸🏽

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

code for paper -- "Seamless Satellite-image Synthesis"

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

Binary Stochastic Neurons in PyTorch

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

Python package to add text to images, textures and different backgrounds

Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Code for Multinomial Diffusion