PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?

Overview

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?"

Install // Datasets // Experiments // Models // License // Reference

Full video

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Installation

We recommend using docker (see nvidia-docker2 instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):

git clone https://github.com/TRI-ML/dd3d.git
cd dd3d
# If you want to use docker (recommended)
make docker-build # CUDA 10.2
# Alternative docker image for cuda 11.1
# make docker-build DOCKERFILE=Dockerfile-cu111

Please check the version of your nvidia driver and cuda compatibility to determine which Dockerfile to use.

We will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with make docker-dev to land in a shell where you can type those commands, or you can do it in one step:

" # multi GPU make docker-run-mpi COMMAND=" "">
# single GPU
make docker-run COMMAND="
     
      "
     
# multi GPU
make docker-run-mpi COMMAND="
     
      "
     

If you want to use features related to AWS (for caching the output directory) and Weights & Biases (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables before building the docker image:

" export AWS_ACCESS_KEY_ID=" " export AWS_DEFAULT_REGION=" " export WANDB_ENTITY=" " export WANDB_API_KEY=" "">
export AWS_SECRET_ACCESS_KEY="
        
         "
        
export AWS_ACCESS_KEY_ID="
        
         "
        
export AWS_DEFAULT_REGION="
        
         "
        
export WANDB_ENTITY="
        
         "
        
export WANDB_API_KEY="
        
         "
        

You should also enable these features in configuration, such as WANDB.ENABLED and SYNC_OUTPUT_DIR_S3.ENABLED.

Datasets

By default, datasets are assumed to be downloaded in /data/datasets/ (can be a symbolic link). The dataset root is configurable by DATASET_ROOT.

KITTI

The KITTI 3D dataset used in our experiments can be downloaded from the KITTI website. For convenience, we provide the standard splits used in 3DOP for training and evaluation:

# download a standard splits subset of KITTI
curl -s https://tri-ml-public.s3.amazonaws.com/github/dd3d/mv3d_kitti_splits.tar | sudo tar xv -C /data/datasets/KITTI3D

The dataset must be organized as follows:


   
    
    └── KITTI3D
        ├── mv3d_kitti_splits
        │   ├── test.txt
        │   ├── train.txt
        │   ├── trainval.txt
        │   └── val.txt
        ├── testing
        │   ├── calib
        |   │   ├── 000000.txt
        |   │   ├── 000001.txt
        |   │   └── ...
        │   └── image_2
        │       ├── 000000.png
        │       ├── 000001.png
        │       └── ...
        └── training
            ├── calib
            │   ├── 000000.txt
            │   ├── 000001.txt
            │   └── ...
            ├── image_2
            │   ├── 000000.png
            │   ├── 000001.png
            │   └── ...
            └── label_2
                ├── 000000.txt
                ├── 000001.txt
                └── ..

   

nuScenes

The nuScenes dataset (v1.0) can be downloaded from the nuScenes website. The dataset must be organized as follows:


   
    
    └── nuScenes
        ├── samples
        │   ├── CAM_FRONT
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243512465.jpg
        │   │   ├── ...
        │   │  
        │   ├── CAM_FRONT_LEFT
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243004917.jpg
        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243504917.jpg
        │   │   ├── ...
        │   │  
        │   ├── ...
        │  
        ├── v1.0-trainval
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...
        │  
        ├── v1.0-test
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...
        │  
        ├── v1.0-mini
        │   ├── attribute.json
        │   ├── calibrated_sensor.json
        │   ├── category.json
        │   ├── ...

   

Pre-trained DD3D models

The DD3D models pre-trained on dense depth estimation using DDAD15M can be downloaded here:

backbone download
DLA34 model
V2-99 model

(Optional) Eigen-clean subset of KITTI raw.

To train our Pseudo-Lidar detector, we curated a new subset of KITTI (raw) dataset and use it to fine-tune its depth network. This subset can be downloaded here. Each row contains left and right image pairs. The KITTI raw dataset can be download here.

Validating installation

To validate and visualize the dataloader (including data augmentation), run the following:

./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4

To validate the entire training loop (including evaluation and visualization), run the overfit experiment (trained on test set):

./scripts/train.py +experiments=dd3d_kitti_dla34_overfit
experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 6 0.25 log 84.54 88.83 model

Experiments

Configuration

We use hydra to configure experiments, specifically following this pattern to organize and compose configurations. The experiments under configs/experiments describe the delta from the default configuration, and can be run as follows:

# omit the '.yaml' extension from the experiment file.
./scripts/train.py +experiments=<experiment-file> <config-override>

The configuration is modularized by various components such as datasets, backbones, evaluators, and visualizers, etc.

Using multiple GPUs

The training script supports (single-node) multi-GPU for training and evaluation via mpirun. This is most conveniently executed by the make docker-run-mpi command (see above). Internally, IMS_PER_BATCH parameters of the optimizer and the evaluator denote the total size of batch that is sharded across available GPUs while training or evaluating. They are required to be set as a multuple of available GPUs.

Evaluation

One can run only evaluation using the pretrained models:

./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model>
# use smaller batch size for single-gpu
./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model> TEST.IMS_PER_BATCH=4

Gradient accumulation

If you have insufficient GPU memory for any experiment, you can use gradient accumulation by configuring ACCUMULATE_GRAD_BATCHES, at the cost of longer training time. For instance, if the experiment requires at least 400 of GPU memory (e.g. V2-99, KITTI) and you have only 128 (e.g., 8 x 16G GPUs), then you can update parameters at every 4th step:

# The original batch size is 64.
./scripts/train.py +experiments=dd3d_kitti_v99 SOLVER.IMS_PER_BATCH=16 SOLVER.ACCUMULATE_GRAD_BATCHES=4

Models

All experiments here use 8 A100 40G GPUs, and use gradient accumulation when more GPU memory is needed. We subsample nuScenes validation set by a factor of 8 (2Hz ⟶ 0.25Hz) to save training time.

KITTI

experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 256 4.5 log 16.92 24.77 model
config V2-99 400 9.0 log 23.90 32.01 model

nuScenes

experiment backbone train mem. (GB) train time (hr) train log mAP (%) NDS download
config DLA-34 TBD TBD TBD) TBD TBD TBD
config V2-99 TBD TBD TBD TBD TBD TBD

License

The source code is released under the MIT license. We note that some code in this repository is adapted from the following repositories:

Reference

@inproceedings{park2021dd3d,
  author = {Dennis Park and Rares Ambrus and Vitor Guizilini and Jie Li and Adrien Gaidon},
  title = {Is Pseudo-Lidar needed for Monocular 3D Object detection?},
  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
  primaryClass = {cs.CV},
  year = {2021},
}
Owner
Toyota Research Institute - Machine Learning
Toyota Research Institute - Machine Learning
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Python PID Tuner - Based on a FOPDT model obtained using a Open Loop Process Reaction Curve

PythonPID_Tuner Step 1: Takes a Process Reaction Curve in csv format - assumes data at 100ms interval (column names CV and PV) Step 2: Makes a rough e

6 Jan 14, 2022
Fully Convlutional Neural Networks for state-of-the-art time series classification

Deep Learning for Time Series Classification As the simplest type of time series data, univariate time series provides a reasonably good starting poin

Stephen 572 Dec 23, 2022
This respository includes implementations on Manifoldron: Direct Space Partition via Manifold Discovery

Manifoldron: Direct Space Partition via Manifold Discovery This respository includes implementations on Manifoldron: Direct Space Partition via Manifo

dayang_wang 4 Apr 28, 2022
Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation (ICCV 2021) This repository contains the implementation of th

Yuhang Zang 21 Dec 17, 2022
RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation Anonymous submission Abstract 3D obj

30 Sep 16, 2022
A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Movenet.Pytorch Intro MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet fro

Mr.Fire 241 Dec 26, 2022
Official implementation of "Implicit Neural Representations with Periodic Activation Functions"

Implicit Neural Representations with Periodic Activation Functions Project Page | Paper | Data Vincent Sitzmann*, Julien N. P. Martel*, Alexander W. B

Vincent Sitzmann 1.4k Jan 06, 2023
Adaptable tools to make reinforcement learning and evolutionary computation algorithms.

Pearl The Parallel Evolutionary and Reinforcement Learning Library (Pearl) is a pytorch based package with the goal of being excellent for rapid proto

38 Jan 01, 2023
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

Michihiro Yasunaga 264 Jan 01, 2023
Lightweight library to build and train neural networks in Theano

Lasagne Lasagne is a lightweight library to build and train neural networks in Theano. Its main features are: Supports feed-forward networks such as C

Lasagne 3.8k Dec 29, 2022
Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size.

Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size. The hub data layout enables rapid transformations and streaming of data while training m

Activeloop 5.1k Jan 08, 2023
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit 👉 Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022
Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

Jingyun Liang 139 Dec 29, 2022
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

GestureSSD_CBAM A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js SSD implementation is based on https://github

xue_senhua1999 2 Jan 06, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022
MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

MultiMix This repository contains the implementation of MultiMix. Our publications for this project are listed below: "MultiMix: Sparingly Supervised,

Ayaan Haque 27 Dec 22, 2022
NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch

PyTorch implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping Paper: https://arxiv.org/abs/2102.06171.pdf Original code: htt

Vaibhav Balloli 320 Jan 02, 2023