Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

Overview

RTM3D-PyTorch

python-image pytorch-image

The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020)


Demonstration

demo

Features

  • Realtime 3D object detection based on a monocular RGB image
  • Support distributed data parallel training
  • Tensorboard
  • ResNet-based Keypoint Feature Pyramid Network (KFPN) (Using by setting --arch fpn_resnet_18)
  • Use images from both left and right cameras (Control by setting the use_left_cam_prob argument)
  • Release pre-trained models

Some modifications from the paper

  • Formula (3):

    • A negative value can't be an input of the log operator, so please don't normalize dim as mentioned in the paper because the normalized dim values maybe less than 0. Hence I've directly regressed to absolute dimension values in meters.
    • Use L1 loss for depth estimation (applying the sigmoid activation to the depth output first).
  • Formula (5): I haven't taken the absolute values of the ground-truth, I have used the relative values instead. The code is here

  • Formula (7): argmin instead of argmax

  • Generate heatmap for the center and vertexes of objects as the CenterNet paper. If you want to use the strategy from RTM3D paper, you can pass the dynamic-sigma argument to the train.py script.

2. Getting Started

2.1. Requirement

pip install -U -r requirements.txt

2.2. Data Preparation

Download the 3D KITTI detection dataset from here.

The downloaded data includes:

  • Training labels of object data set (5 MB)
  • Camera calibration matrices of object data set (16 MB)
  • Left color images of object data set (12 GB)
  • Right color images of object data set (12 GB)

Please make sure that you construct the source code & dataset directories structure as below.

2.3. RTM3D architecture

architecture

The model takes only the RGB images as the input and outputs the main center heatmap, vertexes heatmap, and vertexes coordinate as the base module to estimate 3D bounding box.

2.4. How to run

2.4.1. Visualize the dataset

cd src/data_process
  • To visualize camera images with 3D boxes, let's execute:
python kitti_dataset.py

Then Press n to see the next sample >>> Press Esc to quit...

2.4.2. Inference

Download the trained model from here (will be released), then put it to ${ROOT}/checkpoints/ and execute:

python test.py --gpu_idx 0 --arch resnet_18 --pretrained_path ../checkpoints/rtm3d_resnet_18.pth

2.4.3. Evaluation

python evaluate.py --gpu_idx 0 --arch resnet_18 --pretrained_path <PATH>

2.4.4. Training

2.4.4.1. Single machine, single gpu
python train.py --gpu_idx 0 --arch <ARCH> --batch_size <N> --num_workers <N>...
2.4.4.2. Multi-processing Distributed Data Parallel Training

We should always use the nccl backend for multi-processing distributed training since it currently provides the best distributed training performance.

  • Single machine (node), multiple GPUs
python train.py --dist-url 'tcp://127.0.0.1:29500' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0
  • Two machines (two nodes), multiple GPUs

First machine

python train.py --dist-url 'tcp://IP_OF_NODE1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 0

Second machine

python train.py --dist-url 'tcp://IP_OF_NODE2:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1

To reproduce the results, you can run the bash shell script

./train.sh

Tensorboard

  • To track the training progress, go to the logs/ folder and
cd logs/<saved_fn>/tensorboard/
tensorboard --logdir=./

Contact

If you think this work is useful, please give me a star!
If you find any errors or have any suggestions, please contact me (Email: [email protected]).
Thank you!

Citation

@article{RTM3D,
  author = {Peixuan Li,  Huaici Zhao, Pengfei Liu, Feidao Cao},
  title = {RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving},
  year = {2020},
  conference = {ECCV 2020},
}
@misc{RTM3D-PyTorch,
  author =       {Nguyen Mau Dung},
  title =        {{RTM3D-PyTorch: PyTorch Implementation of the RTM3D paper}},
  howpublished = {\url{https://github.com/maudzung/RTM3D-PyTorch}},
  year =         {2020}
}

References

[1] CenterNet: Objects as Points paper, PyTorch Implementation

Folder structure

${ROOT}
└── checkpoints/    
    ├── rtm3d_resnet_18.pth
    ├── rtm3d_fpn_resnet_18.pth
└── dataset/    
    └── kitti/
        ├──ImageSets/
        │   ├── test.txt
        │   ├── train.txt
        │   └── val.txt
        ├── training/
        │   ├── image_2/ (left color camera)
        │   ├── image_3/ (right color camera)
        │   ├── calib/
        │   ├── label_2/
        └── testing/  
        │   ├── image_2/ (left color camera)
        │   ├── image_3/ (right color camera)
        │   ├── calib/
        └── classes_names.txt
└── src/
    ├── config/
    │   ├── train_config.py
    │   └── kitti_config.py
    ├── data_process/
    │   ├── kitti_dataloader.py
    │   ├── kitti_dataset.py
    │   └── kitti_data_utils.py
    ├── models/
    │   ├── fpn_resnet.py
    │   ├── resnet.py
    │   ├── model_utils.py
    └── utils/
    │   ├── evaluation_utils.py
    │   ├── logger.py
    │   ├── misc.py
    │   ├── torch_utils.py
    │   ├── train_utils.py
    ├── evaluate.py
    ├── test.py
    ├── train.py
    └── train.sh
├── README.md 
└── requirements.txt

Usage

usage: train.py [-h] [--seed SEED] [--saved_fn FN] [--root-dir PATH]
                [--arch ARCH] [--pretrained_path PATH] [--head_conv HEAD_CONV]
                [--hflip_prob HFLIP_PROB]
                [--use_left_cam_prob USE_LEFT_CAM_PROB] [--dynamic-sigma]
                [--no-val] [--num_samples NUM_SAMPLES]
                [--num_workers NUM_WORKERS] [--batch_size BATCH_SIZE]
                [--print_freq N] [--tensorboard_freq N] [--checkpoint_freq N]
                [--start_epoch N] [--num_epochs N] [--lr_type LR_TYPE]
                [--lr LR] [--minimum_lr MIN_LR] [--momentum M] [-wd WD]
                [--optimizer_type OPTIMIZER] [--steps [STEPS [STEPS ...]]]
                [--world-size N] [--rank N] [--dist-url DIST_URL]
                [--dist-backend DIST_BACKEND] [--gpu_idx GPU_IDX] [--no_cuda]
                [--multiprocessing-distributed] [--evaluate]
                [--resume_path PATH] [--K K]

The Implementation of RTM3D using PyTorch

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           re-produce the results with seed random
  --saved_fn FN         The name using for saving logs, models,...
  --root-dir PATH       The ROOT working directory
  --arch ARCH           The name of the model architecture
  --pretrained_path PATH
                        the path of the pretrained checkpoint
  --head_conv HEAD_CONV
                        conv layer channels for output head0 for no conv
                        layer-1 for default setting: 64 for resnets and 256
                        for dla.
  --hflip_prob HFLIP_PROB
                        The probability of horizontal flip
  --use_left_cam_prob USE_LEFT_CAM_PROB
                        The probability of using the left camera
  --dynamic-sigma       If true, compute sigma based on Amax, Amin then
                        generate heamapIf false, compute radius as CenterNet
                        did
  --no-val              If true, dont evaluate the model on the val set
  --num_samples NUM_SAMPLES
                        Take a subset of the dataset to run and debug
  --num_workers NUM_WORKERS
                        Number of threads for loading data
  --batch_size BATCH_SIZE
                        mini-batch size (default: 16), this is the totalbatch
                        size of all GPUs on the current node when usingData
                        Parallel or Distributed Data Parallel
  --print_freq N        print frequency (default: 50)
  --tensorboard_freq N  frequency of saving tensorboard (default: 50)
  --checkpoint_freq N   frequency of saving checkpoints (default: 5)
  --start_epoch N       the starting epoch
  --num_epochs N        number of total epochs to run
  --lr_type LR_TYPE     the type of learning rate scheduler (cosin or
                        multi_step)
  --lr LR               initial learning rate
  --minimum_lr MIN_LR   minimum learning rate during training
  --momentum M          momentum
  -wd WD, --weight_decay WD
                        weight decay (default: 1e-6)
  --optimizer_type OPTIMIZER
                        the type of optimizer, it can be sgd or adam
  --steps [STEPS [STEPS ...]]
                        number of burn in step
  --world-size N        number of nodes for distributed training
  --rank N              node rank for distributed training
  --dist-url DIST_URL   url used to set up distributed training
  --dist-backend DIST_BACKEND
                        distributed backend
  --gpu_idx GPU_IDX     GPU index to use.
  --no_cuda             If true, cuda is not used.
  --multiprocessing-distributed
                        Use multi-processing distributed training to launch N
                        processes per node, which has N GPUs. This is the
                        fastest way to use PyTorch for either single node or
                        multi node data parallel training
  --evaluate            only evaluate the model, not training
  --resume_path PATH    the path of the resumed checkpoint
  --K K                 the number of top K
Owner
Nguyen Mau Dzung
M.Sc. in HCI & Robotics | Self-driving Car Engineer | Senior AI Engineer | Interested in 3D Computer Vision
Nguyen Mau Dzung
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022
Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation

Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation This is the inference codes of Context-Aware Image Matting for Simultaneo

Qiqi Hou 125 Oct 22, 2022
Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch impleme

Berat Eren Terzioğlu 4 Mar 22, 2022
Install alphafold on the local machine, get out of docker.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

Kui Xu 73 Dec 13, 2022
Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form This is supplementary code for the TISMIR paper Sliding-Window Pitch-Class H

1 Nov 27, 2021
Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

Miles Cranmer 38 Dec 11, 2022
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 19, 2021
Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Minimalist Error collection Service Features Compatible with any Rollbar client(see https://docs.rollbar.com/docs). Just change the endpoint URL to yo

Haukur Rósinkranz 381 Nov 11, 2022
Gauge equivariant mesh cnn

Geometric Mesh CNN The code in this repository is an implementation of the Gauge Equivariant Mesh CNN introduced in the paper Gauge Equivariant Mesh C

50 Dec 18, 2022
Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

SRHEN This is a better and simpler implementation for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in

1 Oct 28, 2022
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group 14 Nov 13, 2022
A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

Zhedong Zheng 3.5k Jan 08, 2023
Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

Qingshan Xu 118 Jan 04, 2023
Survival analysis (SA) is a well-known statistical technique for the study of temporal events.

DAGSurv Survival analysis (SA) is a well-known statistical technique for the study of temporal events. In SA, time-to-an-event data is modeled using a

Rahul Kukreja 1 Sep 05, 2022
Relative Human dataset, CVPR 2022

Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including: Depth layers (DLs): relative depth relationsh

Yu Sun 112 Dec 02, 2022
Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Sicheng 19 Dec 07, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning Implementation of soft embeddings from https://arxiv.org/abs/2104.08691v1 using Pytorch and H

Kip Parker 208 Dec 30, 2022
An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Simple Tar Dataset An unopinionated replacement for PyTorch's Dataset and ImageFolder classes, for datasets stored as uncompressed Tar archives. Just

Joao Henriques 47 Dec 20, 2022
EfficientNetV2-with-TPU - Cifar-10 case study

EfficientNetV2-with-TPU EfficientNet EfficientNetV2 adalah jenis jaringan saraf convolutional yang memiliki kecepatan pelatihan lebih cepat dan efisie

Sultan syach 1 Dec 28, 2021