Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

Overview

PWC

EgoNet

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo includes an implementation that performs vehicle orientation estimation on the KITTI dataset from a single RGB image.

News:

(2021-06-21): v-0.9 (beta version) is released. The inference utility is here! For Q&A, go to discussions. If you believe there is a technical problem, submit to issues.

(2021-06-16): This repo is under final code cleaning and documentation preparation. Stay tuned and come back in a week!

Check our 5-min video (Youtube, 爱奇艺) for an introduction.

Run a demo with a one-line command!

Check instructions here.

Performance: APBEV@R40 on KITTI val set for Car (monocular RGB)

Method Reference Easy Moderate Hard
M3D-RPN ICCV 2019 20.85 15.62 11.88
MonoDIS ICCV 2019 18.45 12.58 10.66
MonoPair CVPR 2020 24.12 18.17 15.76
D4LCN CVPR 2020 31.53 22.58 17.87
Kinematic3D ECCV 2020 27.83 19.72 15.10
GrooMeD-NMS CVPR 2021 27.38 19.75 15.92
MonoDLE CVPR 2021 24.97 19.33 17.01
Ours CVPR 2021 33.60 25.38 22.80

Performance: [email protected]40 on KITTI test set for Car (RGB)

Method Reference Configuration Easy Moderate Hard
M3D-RPN ICCV 2019 Monocular 88.38 82.81 67.08
DSGN CVPR 2020 Stereo 95.42 86.03 78.27
Disp-RCNN CVPR 2020 Stereo 93.02 81.70 67.16
MonoPair CVPR 2020 Monocular 91.65 86.11 76.45
D4LCN CVPR 2020 Monocular 90.01 82.08 63.98
Kinematic3D ECCV 2020 Monocular 58.33 45.50 34.81
MonoDLE CVPR 2021 Monocular 93.46 90.23 80.11
Ours CVPR 2021 Monocular 96.11 91.23 80.96

Inference/Deployment

Check instructions here to reproduce the above quantitative results.

Training

Check instructions here to train Ego-Net and learn how to prepare your own training dataset other than KITTI.

Citation

Please star this repository and cite the following paper in your publications if it helps your research:

@InProceedings{Li_2021_CVPR,
author    = {Li, Shichao and Yan, Zengqiang and Li, Hongyang and Cheng, Kwang-Ting},
title     = {Exploring intermediate representation for monocular vehicle pose estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month     = {June},
year      = {2021},
pages     = {1873-1883}
}

License

This repository can be used freely for non-commercial purposes. Contact me if you are interested in a commercial license.

Links

Link to the paper: Exploring intermediate representation for monocular vehicle pose estimation

Link to the presentation video: Youtube, 爱奇艺

Relevant ECCV 2020 work: GSNet

Comments
  • Inference on custom dataset does not give proper 3d bboxes

    Inference on custom dataset does not give proper 3d bboxes

    Hi! I am trying to use the EgoNet on a custom dataset (such as Waymo Dataset) and generate 3D bboxes from the 2D predictions generated by 2D bbox detectors such as Faster R-CNN.

    A sample of KITTI predictions where 2D bboxes are output but 3D bboxes are not yet output are as follows:

    Car -1 -1 0.0 158.0 386.0 220.0 438.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 0.0 0.95
    Car -1 -1 0.0 565.0 386.0 720.0 438.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 0.0 0.95
    

    However, when I set these 2D predictions as additional input for predicting bboxes using EgoNet, the 3D bboxes are not properly output at all and are just a bunch of zeros and minus-ones.

    What should I do?

    opened by joonjeon 8
  • Stream from camera

    Stream from camera

    hello, thanks for you amazing work.

    I would like to ask, what kind of changes do I need to apply in order to perform a stream from camera and detect a pose of object e.g pedestrians. Do I need to re-train your model as well?

    Thanks for answer in an advance

    opened by adamanov 7
  • Running the model on a single image

    Running the model on a single image

    Hey Nicholasli1995,

    Thanks for putting together this awesome repo, I really appreciate how thoroughly documented the setup is.

    Would it be possible to get a few hints as to how to run the model on a single image as opposed to a directory?

    I see that the data_loader in 'inference' is nearly directly from PyTorch, so it's probably my lack of experience with PyTorch that has me confused.

    Separately, I also tried making my own directory and referencing it in my .yaml file, the following command still gave me an error about referencing the KITTI dataset

    python .\inference.py --cfg "../configs/single_inference_test.yml" --visualize True --batch_to_show 1

    image

    I was a little bit surprised to see "training" in the file not found path as I'm not asking to train -- I would expect to have to place an analogous 'test.txt' in this directory with the single test image filename, but I don't think that's currently the issue.

    Thanks! Clayton

    opened by claytonkanderson 5
  • About the license for this model

    About the license for this model

    Thank you for sharing your great code. :smiley_cat:

    What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly. https://github.com/PINTO0309/PINTO_model_zoo

    Thank you.

    opened by PINTO0309 2
  • Reproduce results on the test split

    Reproduce results on the test split

    Thanks for your great work. However, I have some confusion in “Reproduce results on the test split”. If I understand correctly, the inputs to the model are 2D bounding boxes which located in “../resources/test_boxes” and the outputs should be 3D bounding boxes which placed in “../output/submission/data”. However, the results I got were not 3D bounding boxes, as follows: Drawing The results do not contain 3D information. Is it because I made a mistake somewhere?

    opened by Vncois-ZXJ 2
  • No arg_max in KITTI_train_ICRs.yml file

    No arg_max in KITTI_train_ICRs.yml file

    Hello, firstly I would like to thanks for an amazing work and that you made it public with great documentation in order to reproduce and re train the network.

    Today I wanted to train your network according to training - stage2. However, by running the code I got an error as following

    $ python train_IGRs.py --cfg "../configs/KITTI_train_IGRs.yml"
    
    => init weights from normal distribution
    => loading pretrained model ../resources/start_point.pth
    
    Total Parameters: 63,978,471
    ----------------------------------------------------------------------------------------------------------------------------------
    Total Multiply Adds (For Convolution and Linear Layers only): 19.573845863342285 GFLOPs
    ----------------------------------------------------------------------------------------------------------------------------------
    Number of Layers
    Conv2d : 306 layers   BatchNorm2d : 304 layers   ReLU : 269 layers   Bottleneck : 4 layers   BasicBlock : 108 layers   Upsample : 28 layers   HighResolutionModule : 8 layers   Sigmoid : 1 layers   
    Initializing KITTI train set, please wait...
    Found prepared keypoints at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_train_['Car'].npy
    Found prepared instance_ids at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_train_['Car']_ids.npy
    Found prepared rotations at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_train_['Car']_rots.npy
    Initialization finished for KITTI train set
    Initializing KITTI valid set, please wait...
    Found prepared keypoints at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_valid_['Car'].npy
    Found prepared instance_ids at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_valid_['Car']_ids.npy
    Found prepared rotations at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_valid_['Car']_rots.npy
    Initialization finished for KITTI valid set
    Traceback (most recent call last):
      File "train_IGRs.py", line 159, in <module>
        main()
      File "train_IGRs.py", line 154, in main
        train(model, model_settings, GPUs, cfgs, logger, final_output_dir)
      File "train_IGRs.py", line 89, in train
        trainer.train(train_dataset=train_dataset, 
      File "../libs/trainer/trainer.py", line 154, in train
        evaluator = Evaluator(cfgs['training_settings']['eval_metrics'], 
      File "../libs/metric/criterions.py", line 546, in __init__
        self.metrics.append(eval(metric + '(cfgs=cfgs, num_joints=num_joints)'))
      File "<string>", line 1, in <module>
      File "../libs/metric/criterions.py", line 183, in __init__
        self.arg_max = cfgs['testing_settings']['arg_max']
    KeyError: 'arg_max
    

    It seems that arg_max key is missing in the KITTI_train_IGRs.yml

    How to arg_max parameter should be defined?

    Thanks for answer in advance.

    opened by adamanov 1
  • Reproduce Result in Kitti dataset

    Reproduce Result in Kitti dataset

    Hello @Nicholasli1995 thank you for your implementation. I have several questions.

    1. What split (train and validation split) did you use for this code? is it subcnn split?
    2. during the inference using testing set do you use ROI from the object detection algorithm?
    3. Is it still use the calibration or this system can automatically generate it?
    4. How to produce the similar table as in paper? I see that you use AOS, AP, also something like Easy, Medium, Hard. However when I tried to use validation set evaluation and use kitti_eval_offline I only got some plots, and statistic informations in txt and final result as here.
    python inference.py --cfg "../configs/try_KITTI_inference:test_submission.yml"
    
    Wrote prediction file at ../result/gt_box_test/data/007025.txt
    Warning: 007025.png not included in detected images!
    PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
    ==> 1 page written on `car_detection.pdf'.
    PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
    ==> 1 page written on `car_orientation.pdf'.
    PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
    ==> 1 page written on `car_detection_ground.pdf'.
    PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
    ==> 1 page written on `car_detection_3d.pdf'.
    Thank you for participating in our evaluation!
    Loading detections...
    number of files for evaluation: 1634
      done.
    save ../result/submission/plot/car_detection.txt
    car_detection AP: 93.934082 84.951851 67.594185
    save ../result/submission/plot/car_orientation.txt
    car_orientation AP: 93.664360 84.683617 67.141930
    save ../result/submission/plot/car_detection_ground.txt
    car_detection_ground AP: 41.456226 31.880611 24.876181
    save ../result/submission/plot/car_detection_3d.txt
    car_detection_3d AP: 29.933027 24.017235 18.650723
    Your evaluation results are available at:
    ../result/submission
    

    It only produce AP no AOS. Is it because you use testing set then evaluate directly on KITTI evaluation systems so they will provide you the detail result? Suppose we want to measure in Validation set, which value must I use? because there are many information such as car_detection AP car_orientation AP car_detection_ground AP car_detection_3d AP. So in specific how to produce table 1 as in paper if we use validation set?

    -Thank you-

    opened by ftlong6666 1
  • Visualazition problem

    Visualazition problem

    Thank you for share your huge work ! Can the output coordinates of the model be directly visualized, does any transformation is needed to display them in the original image?

    opened by xiaozhihao0 0
  • Relation between kpts_3d_pred and pose

    Relation between kpts_3d_pred and pose

    Hello, Thank you for open-sourcing this amazing project!

    I have a question about the convention for the transformation of the 3D box. EgoNet only produces an egocentric pose (i.e. camera coordinates) corresponding to the rotation between the 3D box extracted from the keypoints and a template 3D box. We also have a translation corresponding to the first point in kpts_3D_pred, here.

    To better understand the coordinate systems involved I'm doing the following experiment:

    1. Create a template 3D bounding box following this, in the canonical pose.
    2. Rotate it with the rotation matrix given by EgoNet, this one

    After doing these two steps, I still need one translation to place the 3D box in space (in the camera system). The question is, what translation should I use? Is it the one corresponding to the first point in kpts_3d_pred?

    Thank you for your time

    opened by nviolante25 2
  • Generate 3D rectangular coordinates using 2D rectangular boxes

    Generate 3D rectangular coordinates using 2D rectangular boxes

    Thank you for the work you have done, and open source the code. I would like to ask how to use your code, will only 2D annotation box pictures with labels, generate 3D coordinate box used for detection tasks? I have a problem now, Is it necessary to have this file? I tried to modify it and got an error. I want to know what it does {{{ Download the resources folder and unzip its contents. Place the resource folder at ${EgoNet_DIR}/resources }}} English is translated by machine, If there is any improper wording, please forgive me Thanks you

    opened by yuan960426 4
  • Use EgoNet on custom data

    Use EgoNet on custom data

    Thank you for sharing your work. I want to use EgoNet for 3D bbox and object orientation estimation tasks on custom data. How should I proceed, do I need any other model's 2D/3D bbox predictions to start with or if I change the input data while testing I can get both 3D bbox and orientation predictions?

    opened by ratnam18 5
Releases(v-1.0)
Owner
Shichao Li
A PhD candidate @ HKUST working on computer vision and machine learning
Shichao Li
An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

0 May 06, 2022
Reproduction process of AlexNet

PaddlePaddle论文复现杂谈 背景 注:该repo基于PaddlePaddle,对AlexNet进行复现。时间仓促,难免有所疏漏,如果问题或者想法,欢迎随时提issue一块交流。 飞桨论文复现赛地址:https://aistudio.baidu.com/aistudio/competitio

19 Nov 29, 2022
thundernet ncnn

MMDetection_Lite 基于mmdetection 实现一些轻量级检测模型,安装方式和mmdeteciton相同 voc0712 voc 0712训练 voc2007测试 coco预训练 thundernet_voc_shufflenetv2_1.5 input shape mAP 320

DayBreak 39 Dec 05, 2022
PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

ALiBi PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. Quickstart Clone this reposit

Jake Tae 4 Jul 27, 2022
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 08, 2021
A PyTorch implementation of SIN: Superpixel Interpolation Network

SIN: Superpixel Interpolation Network This is is a PyTorch implementation of the superpixel segmentation network introduced in our PRICAI-2021 paper:

6 Sep 28, 2022
A fast, dataset-agnostic, deep visual search engine for digital art history

imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern

Fabian Offert 5 Dec 14, 2022
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 07, 2022
Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala,

Stefan Krastanov 1 Jan 17, 2022
Split your patch similarly to `git add -p` but supporting multiple buckets

split-patch.py This is git add -p on steroids for patches. Given a my.patch you can run ./split-patch.py my.patch You can choose in which bucket to p

102 Oct 06, 2022
UltraGCN: An Ultra Simplification of Graph Convolutional Networks for Recommendation

UltraGCN This is our Pytorch implementation for our CIKM 2021 paper: Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, Xiuqiang He. UltraGCN: A

XUEPAI 93 Jan 03, 2023
PyTorch implementations for our SIGGRAPH 2021 paper: Editable Free-viewpoint Video Using a Layered Neural Representation.

st-nerf We provide PyTorch implementations for our paper: Editable Free-viewpoint Video Using a Layered Neural Representation SIGGRAPH 2021 Jiakai Zha

Diplodocus 258 Jan 02, 2023
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
v objective diffusion inference code for JAX.

v-diffusion-jax v objective diffusion inference code for JAX, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The models

Katherine Crowson 186 Dec 21, 2022
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

Manav Mishra 4 Apr 15, 2022
EMNLP 2020 - Summarizing Text on Any Aspects

Summarizing Text on Any Aspects This repo contains preliminary code of the following paper: Summarizing Text on Any Aspects: A Knowledge-Informed Weak

Bowen Tan 35 Nov 14, 2022