Official implementation of EfficientPose

Last update: May 17, 2022

Related tags

Overview

EfficientPose

This is the official implementation of EfficientPose. We based our work on the Keras EfficientDet implementation xuannianz/EfficientDet which again builds up on the great Keras RetinaNet implementation fizyr/keras-retinanet, the official EfficientDet implementation google/automl and qubvel/efficientnet.

Installation

Clone this repository
Create a new environment with conda create -n EfficientPose python==3.6
Activate that environment with conda activate EfficientPose
Install Tensorflow 1.15.0 with conda install tensorflow-gpu==1.15.0
Go to the repo dir and install the other dependencys using pip install -r requirements.txt
Compile cython modules with python setup.py build_ext --inplace

Dataset and pretrained weights

You can download the Linemod and Occlusion datasets and the pretrained weights from here. Just unzip the Linemod_and_Occlusion.zip file and you can train or evaluate using these datasets as described below.

The dataset is originally downloaded from j96w/DenseFusion as well as chensong1995/HybridPose and were preprocessed using the generate_masks.py script. The EfficientDet COCO pretrained weights are from xuannianz/EfficientDet.

Training

Linemod

To train a phi = 0 EfficientPose model on object 8 of Linemod (driller) using COCO pretrained weights:

python train.py --phi 0 --weights /path_to_weights/file.h5 linemod /path_to_dataset/Linemod_preprocessed/ --object-id 8

Occlusion

To train a phi = 0 EfficientPose model on Occlusion using COCO pretrained weights:

python train.py --phi 0 --weights /path_to_weights/file.h5 occlusion /path_to_dataset/Linemod_preprocessed/

See train.py for more arguments.

Evaluating

Linemod

To evaluate a trained phi = 0 EfficientPose model on object 8 of Linemod (driller) and (optionally) save the predicted images:

python evaluate.py --phi 0 --weights /path_to_weights/file.h5 --validation-image-save-path /where_to_save_predicted_images/ linemod /path_to_dataset/Linemod_preprocessed/ --object-id 8

Occlusion

To evaluate a trained phi = 0 EfficientPose model on Occlusion and (optionally) save the predicted images:

python evaluate.py --phi 0 --weights /path_to_weights/file.h5 --validation-image-save-path /where_to_save_predicted_images/ occlusion /path_to_dataset/Linemod_preprocessed/

If you don`t want to save the predicted images just skip the --validation-image-save-path argument.

Inferencing

We also provide two basic scripts demonstrating the exemplary use of a trained EfficientPose model for inferencing. With python inference.py you can run EfficientPose on all images in a directory. The needed parameters, e.g. the path to the images and the model can be modified in the inference.py script.

With python inference_webcam.py you can run EfficientPose live with your webcam. Please note that you have to replace the intrinsic camera parameters used in this script (Linemod) with your webcam parameters. Since the Linemod and Occlusion datasets are too small to expect a reasonable 6D pose estimation performance in the real world and a lot of people probably do not have the exact same objects used in Linemod (like me), you can try to display a Linemod image on your screen and film it with your webcam.

Benchmark

To measure the runtime of EfficientPose on your machine you can use python benchmark_runtime.py. The needed parameters, e.g. the path to the model can be modified in the benchmark_runtime.py script. Similarly, you can also measure the vanilla EfficientDet runtime on your machine with the benchmark_runtime_vanilla_effdet.py script.

Debugging Dataset and Generator

If you want to modify the generators or build a new custom dataset, it can be very helpful to display the dataset annotations loaded from your generator to make sure everything works as expected. With

python debug.py --phi 0 --annotations linemod /path_to_dataset/Linemod_preprocessed/ --object-id 8

you can display the loaded and augmented image as well as annotations prepared for a phi = 0 model from object 8 of the Linemod dataset. Please see debug.py for more arguments.

Citation

Please cite EfficientPose if you use it in your research

@misc{bukschat2020efficientpose,
      title={EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach}, 
      author={Yannick Bukschat and Marcus Vetter},
      year={2020},
      eprint={2011.04307},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

EfficientPose is licensed under the Creative Commons Attribution-NonCommercial 4.0 International license and is freely available for non-commercial use. Please see the LICENSE for further details. If you are interested in commercial use, please contact us under [email protected] or [email protected].

Official implementation of EfficientPose

Related tags

Overview

EfficientPose

Installation

Dataset and pretrained weights

Training

Linemod

Occlusion

Evaluating

Linemod

Occlusion

Inferencing

Benchmark

Debugging Dataset and Generator

Citation

License

Owner

A Unified Generative Framework for Various NER Subtasks.

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

B-cos Networks: Attention is All we Need for Interpretability

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Few-shot NLP benchmark for unified, rigorous eval

This repository contains a Ruby API for utilizing TensorFlow.

Reinforcement Learning for Portfolio Management

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

System Combination for Grammatical Error Correction Based on Integer Programming

Introducing neural networks to predict stock prices

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Implementing Vision Transformer (ViT) in PyTorch

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Implementation of Axial attention - attending to multi-dimensional data efficiently

This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)