[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

Overview

FFB6D

This is the official source code for the CVPR2021 Oral work, FFB6D: A Full Flow Biderectional Fusion Network for 6D Pose Estimation. (Arxiv)

Table of Content

Introduction & Citation

FFB6D is a general framework for representation learning from a single RGBD image, and we applied it to the 6D pose estimation task by cascading downstream prediction headers for instance semantic segmentation and 3D keypoint voting prediction from PVN3D(Arxiv, Code, Video). At the representation learning stage of FFB6D, we build bidirectional fusion modules in the full flow of the two networks, where fusion is applied to each encoding and decoding layer. In this way, the two networks can leverage local and global complementary information from the other one to obtain better representations. Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.

Please cite FFB6D & PVN3D if you use this repository in your publications:

@InProceedings{He_2021_CVPR,
author = {He, Yisheng and Huang, Haibin and Fan, Haoqiang and Chen, Qifeng and Sun, Jian},
title = {FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

@InProceedings{He_2020_CVPR,
author = {He, Yisheng and Sun, Wei and Huang, Haibin and Liu, Jianran and Fan, Haoqiang and Sun, Jian},
title = {PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Installation

  • Install CUDA 10.1 / 10.2

  • Set up python3 environment from requirement.txt:

    pip3 install -r requirement.txt 
  • Install apex:

    git clone https://github.com/NVIDIA/apex
    cd apex
    export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5"  # set the target architecture manually, suggested in issue https://github.com/NVIDIA/apex/issues/605#issuecomment-554453001
    pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    cd ..
  • Install normalSpeed, a fast and light-weight normal map estimator:

    git clone https://github.com/hfutcgncas/normalSpeed.git
    cd normalSpeed/normalSpeed
    python3 setup.py install --user
    cd ..
  • Install tkinter through sudo apt install python3-tk

  • Compile RandLA-Net operators:

    cd ffb6d/models/RandLA/
    sh compile_op.sh

Code Structure

[Click to expand]
  • ffb6d
    • ffb6d/common.py: Common configuration of dataset and models, eg. dataset path, keypoints path, batch size and so on.
    • ffb6d/datasets
      • ffb6d/datasets/ycb
        • ffb6d/datasets/ycb/ycb_dataset.py: Data loader for YCB_Video dataset.
        • ffb6d/datasets/ycb/dataset_config
          • ffb6d/datasets/ycb/dataset_config/classes.txt: Object list of YCB_Video dataset.
          • ffb6d/datasets/ycb/dataset_config/radius.txt: Radius of each object in YCB_Video dataset.
          • ffb6d/datasets/ycb/dataset_config/train_data_list.txt: Training set of YCB_Video datset.
          • ffb6d/datasets/ycb/dataset_config/test_data_list.txt: Testing set of YCB_Video dataset.
        • ffb6d/datasets/ycb/ycb_kps
          • ffb6d/datasets/ycb/ycb_kps/{obj_name}_8_kps.txt: ORB-FPS 3D keypoints of an object in the object coordinate system.
          • ffb6d/datasets/ycb/ycb_kps/{obj_name}_corners.txt: 8 corners of the 3D bounding box of an object in the object coordinate system.
    • ffb6d/models
      • ffb6d/models/ffb6d.py: Network architecture of the proposed FFB6D.
      • ffb6d/models/cnn
        • ffb6d/models/cnn/extractors.py: Resnet backbones.
        • ffb6d/models/cnn/pspnet.py: PSPNet decoder.
        • ffb6d/models/cnn/ResNet_pretrained_mdl: Resnet pretraiend model weights.
      • ffb6d/models/loss.py: loss calculation for training of FFB6D model.
      • ffb6d/models/pytorch_utils.py: pytorch basic network modules.
      • ffb6d/models/RandLA/: pytorch version of RandLA-Net from RandLA-Net-pytorch
    • ffb6d/utils
      • ffb6d/utils/basic_utils.py: basic functions for data processing, visualization and so on.
      • ffb6d/utils/meanshift_pytorch.py: pytorch version of meanshift algorithm for 3D center point and keypoints voting.
      • ffb6d/utils/pvn3d_eval_utils_kpls.py: Object pose esitimation from predicted center/keypoints offset and evaluation metrics.
      • ffb6d/utils/ip_basic: Image Processing for Basic Depth Completion from ip_basic.
      • ffb6d/utils/dataset_tools
        • ffb6d/utils/dataset_tools/DSTOOL_README.md: README for dataset tools.
        • ffb6d/utils/dataset_tools/requirement.txt: Python3 requirement for dataset tools.
        • ffb6d/utils/dataset_tools/gen_obj_info.py: Generate object info, including SIFT-FPS 3d keypoints, radius etc.
        • ffb6d/utils/dataset_tools/rgbd_rnder_sift_kp3ds.py: Render rgbd images from mesh and extract textured 3d keypoints (SIFT/ORB).
        • ffb6d/utils/dataset_tools/utils.py: Basic utils for mesh, pose, image and system processing.
        • ffb6d/utils/dataset_tools/fps: Furthest point sampling algorithm.
        • ffb6d/utils/dataset_tools/example_mesh: Example mesh models.
    • ffb6d/train_ycb.py: Training & Evaluating code of FFB6D models for the YCB_Video dataset.
    • ffb6d/demo.py: Demo code for visualization.
    • ffb6d/train_ycb.sh: Bash scripts to start the training on the YCB_Video dataset.
    • ffb6d/test_ycb.sh: Bash scripts to start the testing on the YCB_Video dataset.
    • ffb6d/demo_ycb.sh: Bash scripts to start the demo on the YCB_Video_dataset.
    • ffb6d/train_log
      • ffb6d/train_log/ycb
        • ffb6d/train_log/ycb/checkpoints/: Storing trained checkpoints on the YCB_Video dataset.
        • ffb6d/train_log/ycb/eval_results/: Storing evaluated results on the YCB_Video_dataset.
        • ffb6d/train_log/ycb/train_info/: Training log on the YCB_Video_dataset.
  • requirement.txt: python3 environment requirements for pip3 install.
  • figs/: Images shown in README.

Datasets

  • YCB-Video: Download the YCB-Video Dataset from PoseCNN. Unzip it and link the unzippedYCB_Video_Dataset to ffb6d/datasets/ycb/YCB_Video_Dataset:

    ln -s path_to_unzipped_YCB_Video_Dataset ffb6d/datasets/ycb/

Training and evaluating

Training on the YCB-Video Dataset

  • Start training on the YCB-Video Dataset by:

    # commands in train_ycb.sh
    n_gpu=8  # number of gpu to use
    python3 -m torch.distributed.launch --nproc_per_node=$n_gpu train_ycb.py --gpus=$n_gpu

    The trained model checkpoints are stored in train_log/ycb/checkpoints/

    A tip for saving GPU memory: you can open the mixed precision mode to save GPU memory by passing parameters opt_level=O1 to train_ycb.py. The document for apex mixed precision trainnig can be found here.

Evaluating on the YCB-Video Dataset

  • Start evaluating by:
    # commands in test_ycb.sh
    tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar  # checkpoint to test.
    python3 -m torch.distributed.launch --nproc_per_node=1 train_ycb.py --gpu '0' -eval_net -checkpoint $tst_mdl -test -test_pose # -debug
    You can evaluate different checkpoints by revising the tst_mdl to the path of your target model.
  • Pretrained model: We provide our pre-trained models on onedrive, here. Download the pre-trained model, move it to train_log/ycb/checkpoints/ and modify tst_mdl for testing.

Demo/visualization on the YCB-Video Dataset

  • After training your model or downloading the pre-trained model, you can start the demo by:
    # commands in demo_ycb.sh
    tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar
    python3 -m demo -checkpoint $tst_mdl -dataset ycb
    The visualization results will be stored in train_log/ycb/eval_results/pose_vis.

Results

  • Evaluation result without any post refinement on the YCB-Video dataset:

    PoseCNN PointFusion DenseFusion PVN3D Our FFF6D
    ADDS ADD(S) ADDS ADD(S) ADDS ADD(S) ADDS ADD(S) ADDS ADD(S)
    ALL 75.8 59.9 83.9 - 91.2 82.9 95.5 91.8 96.6 92.7
  • Evaluation result on the LineMOD dataset:

    RGB RGB-D
    PVNet CDPN DPOD PointFusion DenseFusion(iterative) G2L-Net PVN3D FFF6D
    MEAN 86.3 89.9 95.2 73.7 94.3 98.7 99.4 99.7
  • Robustness upon occlusion:

  • Model parameters and speed on the LineMOD dataset (one object / frame) with one 2080Ti GPU:
    Parameters Network Forward Pose Estimation All time
    PVN3D 39.2M 170ms 20ms 190ms
    FFF6D
    33.8M 57ms 18ms 75ms

Adaptation to New Dataset

  • Install and generate required mesh info following DSTOOL_README.

  • Modify info of your new dataset in FFB6D/ffb6d/common.py

  • Write your dataset preprocess script following FFB6D/ffb6d/datasets/ycb/ycb_dataset.py. Note that you should modify or call the function that get your model info, such as 3D keypoints, center points, and radius properly.

  • (Very Important!) Visualize and check if you process the data properly, eg, the projected keypoints and center point, the semantic label of each point, etc. For example, you can visualize the projected center point (red point) and selected keypoints (orange points) as follow by running python3 -m datasets.ycb.ycb_dataset.

  • For inference, make sure that you load the 3D keypoints, center point, and radius of your objects in the object coordinate system properly in FFB6D/ffb6d/utils/pvn3d_eval_utils.py.

  • Check that all setting are modified properly by using the ground truth information for evaluation. The result should be high and close to 100 if everything is correct. For example, testing ground truth on the YCB_Video dataset by passing -test_gt parameters to train_ycb.py will get results higher than 99.99:

    tst_mdl=train_log/ycb/checkpoints/FFB6D_best.pth.tar
    python3 -m torch.distributed.launch --nproc_per_node=1 train_ycb.py --gpu '0' -eval_net -checkpoint $tst_mdl -test -test_pose -test_gt
    

To Do

  • Scripts and pre-trained models for LineMOD dataset.

License

Licensed under the MIT License.

Owner
Yisheng (Ethan) He
Ph.D. student @ HKUST
Yisheng (Ethan) He
x-transformers-paddle 2.x version

x-transformers-paddle x-transformers-paddle 2.x version paddle 2.x版本 https://github.com/lucidrains/x-transformers 。 requirements paddlepaddle-gpu==2.2

yujun 7 Dec 08, 2022
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl

MIRA Lab 33 Dec 07, 2022
This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

Nonlinear Risk Bounded Robot Motion Planning This code simulates the bicycle dynamics of car by steering it on the road by avoiding another static car

8 Sep 03, 2022
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Anton Jeran Ratnarajah 89 Dec 22, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. Installation pip install -r requirements.txt Prepare Dataset bash data/scripts/pre

Zj Li 8 Sep 07, 2021
A python library for face detection and features extraction based on mediapipe library

FaceAnalyzer A python library for face detection and features extraction based on mediapipe library Introduction FaceAnalyzer is a library based on me

Saifeddine ALOUI 14 Dec 30, 2022
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

3 Aug 08, 2021
Machine Learning automation and tracking

The Open-Source MLOps Orchestration Framework MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-lea

873 Jan 04, 2023
Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

Fangneng Zhan 183 Nov 09, 2022
This repository contains the map content ontology used in narrative cartography

Narrative-cartography-ontology This repository contains the map content ontology used in narrative cartography, which is associated with a submission

Weiming Huang 0 Oct 31, 2021
PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Pytorch implementation of VQVAE. This paper combines 2 tricks: Vector Quantization (check out this amazing blog for better understanding.) Straight-Th

Vrushank Changawala 2 Oct 06, 2021
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Google Cloud Vertex AI Samples Welcome to the Google Cloud Vertex AI sample repository. Overview The repository contains notebooks and community conte

Google Cloud Platform 560 Dec 31, 2022
CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

CrossNorm (CN) and SelfNorm (SN) (Accepted at ICCV 2021) This is the official PyTorch implementation of our CNSN paper, in which we propose CrossNorm

100 Dec 28, 2022
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) 🔥 If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 04, 2023
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Jan 01, 2023
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

MORAI 62 Dec 17, 2022
Classification of EEG data using Deep Learning

Graduation-Project Classification of EEG data using Deep Learning Epilepsy is the most common neurological disease in the world. Epilepsy occurs as a

Osman Alpaydın 5 Jun 24, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 04, 2023