Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Overview

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

report report

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation,
Zicong Fan, Adrian Spurr, Muhammed Kocabas, Siyu Tang, Michael J. Black, Otmar Hilliges International Conference on 3D Vision (3DV), 2021

Image

Features

DIGIT estimates the 3D poses of two interacting hands from a single RGB image. This repo provides the training, evaluation, and demo code for the project in PyTorch Lightning.

Updates

  • November 25 2021: Initial repo with training and evaluation on PyTorch Lightning 0.9.

Setting up environment

DIGIT has been implemented and tested on Ubuntu 18.04 with python >= 3.7, PyTorch Lightning 0.9 and PyTorch 1.6.

Clone the repo:

git clone https://github.com/zc-alexfan/digit-interacting

Create folders needed:

make folders

Install conda environment:

conda create -n digit python=3.7
conda deactivate
conda activate digit
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt

Downloading InterHand2.6M

  • Download the 5fps.v1 of InterHand2.6M, following the instructions here
  • Place annotations, images, and rootnet_output from InterHand2.6M under ./data/InterHand/*:
./data/InterHand
├── annotations
├── images
│   ├── test
│   ├── train
│   └── val
├── rootnet_output
│   ├── rootnet_interhand2.6m_output_all_test.json
│   └── rootnet_interhand2.6m_output_machine_annot_val.json
|-- annotations
|-- images
|   |-- test
|   |-- train
|   `-- val
`-- rootnet_output
    |-- rootnet_interhand2.6m_output_test.json
    `-- rootnet_interhand2.6m_output_val.json
  • The folder ./data/InterHand/annotations should look like this:
./data/InterHand/annotations
|-- skeleton.txt
|-- subject.txt
|-- test
|   |-- InterHand2.6M_test_MANO_NeuralAnnot.json
|   |-- InterHand2.6M_test_camera.json
|   |-- InterHand2.6M_test_data.json
|   `-- InterHand2.6M_test_joint_3d.json
|-- train
|   |-- InterHand2.6M_train_MANO_NeuralAnnot.json
|   |-- InterHand2.6M_train_camera.json
|   |-- InterHand2.6M_train_data.json
|   `-- InterHand2.6M_train_joint_3d.json
`-- val
    |-- InterHand2.6M_val_MANO_NeuralAnnot.json
    |-- InterHand2.6M_val_camera.json
    |-- InterHand2.6M_val_data.json
    `-- InterHand2.6M_val_joint_3d.json

Preparing data and backbone for training

Download the ImageNet-pretrained backbone from here and place it under:

./saved_models/pytorch/imagenet/hrnet_w32-36af842e.pt

Package images into lmdb:

cd scripts
python package_images_lmdb.py

Preprocess annotation:

python preprocess_annot.py

Render part segmentation masks:

  • Following the README.md of render_mano_ih to prepare an LMDB of part segmentation. For question in preparing the segmentation masks, please keep issues in there.

Place the LMDB from the images, the segmentation masks, and meta_dict_*.pkl to ./data/InterHand and it should look like the structure below. The cache files meta_dict_*.pkl are by-products of the step above.

|-- annotations
|   |-- skeleton.txt
|   |-- subject.txt
|   |-- test
|   |   |-- InterHand2.6M_test_MANO_NeuralAnnot.json
|   |   |-- InterHand2.6M_test_camera.json
|   |   |-- InterHand2.6M_test_data.json
|   |   |-- InterHand2.6M_test_data.pkl
|   |   `-- InterHand2.6M_test_joint_3d.json
|   |-- train
|   |   |-- InterHand2.6M_train_MANO_NeuralAnnot.json
|   |   |-- InterHand2.6M_train_camera.json
|   |   |-- InterHand2.6M_train_data.json
|   |   |-- InterHand2.6M_train_data.pkl
|   |   `-- InterHand2.6M_train_joint_3d.json
|   `-- val
|       |-- InterHand2.6M_val_MANO_NeuralAnnot.json
|       |-- InterHand2.6M_val_camera.json
|       |-- InterHand2.6M_val_data.json
|       |-- InterHand2.6M_val_data.pkl
|       `-- InterHand2.6M_val_joint_3d.json
|-- cache
|   |-- meta_dict_test.pkl
|   |-- meta_dict_train.pkl
|   `-- meta_dict_val.pkl
|-- images
|   |-- test
|   |-- train
|   `-- val
|-- rootnet_output
|   |-- rootnet_interhand2.6m_output_test.json
|   `-- rootnet_interhand2.6m_output_val.json
`-- segm_32.lmdb

Training and evaluating

To train DIGIT, run the command below. The script runs at a batch size of 64 using accumulated gradient where each iteration is on a batch size 32:

python train.py --iter_batch 32 --batch_size 64 --gpu_ids 0 --trainsplit train --precision 16 --eval_every_epoch 2 --lr_dec_epoch 40 --max_epoch 50 --min_epoch 50

OR if you just want to do a sanity check you can run:

python train.py --iter_batch 32 --batch_size 64 --gpu_ids 0 --trainsplit minitrain --valsplit minival --precision 16 --eval_every_epoch 1 --max_epoch 50 --min_epoch 50

Each time you run train.py, it will create a new experiment under logs and each experiment is assigned a key.

Supposed your experiment key is 2e8c5136b, you can evaluate the last epoch of the model on the test set by:

python test.py --eval_on minitest --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

OR

python test.py --eval_on test --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

The former only does the evaluation 1000 images for a sanity check.

Similarly, you can evaluate on the validation set:

python test.py --eval_on val --load_ckpt logs/2e8c5136b/model_dump/last.ckpt

Visualizing and evaluating pre-trained DIGIT

Here we provide instructions to show qualitative results of DIGIT.

Download pre-trained DIGIT:

wget https://dataset.ait.ethz.ch/downloads/dE6qPPePCV/db7cba8c1.pt
mv db7cba8c1.pt saved_models

Visualize results:

CUDA_VISIBLE_DEVICES=0 python demo.py --eval_on minival --load_from saved_models/db7cba8c1.pt  --num_workers 0

Evaluate pre-trained digit:

CUDA_VISIBLE_DEVICES=0 python test.py --eval_on test --load_from saved_models/db7cba8c1.pt --precision 16
CUDA_VISIBLE_DEVICES=0 python test.py --eval_on val --load_from saved_models/db7cba8c1.pt --precision 16

You should have the same results as in here.

The results will be dumped to ./visualization.

Citation

@inProceedings{fan2021digit,
  title={Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-pixel Part Segmentation},
  author={Fan, Zicong and Spurr, Adrian and Kocabas, Muhammed and Tang, Siyu and Black, Michael and Hilliges, Otmar},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2021}
}

License

Since our code is developed based on InterHand2.6M, which is CC-BY-NC 4.0 licensed, the same LICENSE is applied to DIGIT.

DIGIT is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

References

Some code in our repo uses snippets of the following repo:

Please consider citing them if you find our code useful:

@inproceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}  

@inproceedings{sun2019deep,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={CVPR},
  year={2019}
}

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}

@misc{Charles2013,
  author = {milesial},
  title = {Pytorch-UNet},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/milesial/Pytorch-UNet}}
}

Contact

For any question, you can contact [email protected].

Owner
Zicong Fan
A Ph.D. student at ETH Zurich.
Zicong Fan
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

OstrichRL This is the repository accompanying the paper OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion. It contain

Vittorio La Barbera 51 Nov 17, 2022
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

One2Set This repository contains the code for our ACL 2021 paper “One2Set: Generating Diverse Keyphrases as a Set”. Our implementation is built on the

Jiacheng Ye 63 Jan 05, 2023
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 03, 2023
Code for our NeurIPS 2021 paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

GateL0RD This is a lightweight PyTorch implementation of GateL0RD, our RNN presented in "Sparsely Changing Latent States for Prediction and Planning i

Autonomous Learning Group 16 Nov 03, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions Project Page | Paper If you find our work useful for your research, please con

96 Jan 04, 2023
A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis Project Page | Paper A Shading-Guided Generative Implicit Model

Xingang Pan 115 Dec 18, 2022
A PyTorch implementation of SIN: Superpixel Interpolation Network

SIN: Superpixel Interpolation Network This is is a PyTorch implementation of the superpixel segmentation network introduced in our PRICAI-2021 paper:

6 Sep 28, 2022
Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

Open Sesame This repository contains the code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge. Credits We built the project on t

9 Jul 24, 2022
Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

WIDER-YOLO : Yüz Tespit Uygulaması Yap Wider-Yolo Kütüphanesinin Kullanımı 1. Wider Face Veri Setini İndir Train Dataset Val Dataset Test Dataset Not:

Kadir Nar 6 Aug 22, 2022
68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 06, 2022
Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Cross-media Structured Common Space for Multimedia Event Extraction Table of Contents Overview Requirements Data Quickstart Citation Overview The code

Manling Li 49 Nov 21, 2022
Programming with Neural Surrogates of Programs

Programming with Neural Surrogates of Programs

0 Dec 12, 2021
This repository contains the implementation of the paper: Federated Distillation of Natural Language Understanding with Confident Sinkhorns

Federated Distillation of Natural Language Understanding with Confident Sinkhorns This repository provides an alternative method for ensembled distill

Deep Cognition and Language Research (DeCLaRe) Lab 11 Nov 16, 2022
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

CVMI Lab 228 Dec 25, 2022
RaceBERT -- A transformer based model to predict race and ethnicty from names

RaceBERT -- A transformer based model to predict race and ethnicty from names Installation pip install racebert Using a virtual environment is highly

Prasanna Parasurama 3 Nov 02, 2022
A collection of inference modules for fastai2

fastinference A collection of inference modules for fastai including inference speedup and interpretability Install pip install fastinference There ar

Zachary Mueller 83 Oct 10, 2022
EM-POSE 3D Human Pose Estimation from Sparse Electromagnetic Trackers.

EM-POSE: 3D Human Pose Estimation from Sparse Electromagnetic Trackers This repository contains the code to our paper published at ICCV 2021. For ques

Facebook Research 62 Dec 14, 2022
A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Data-Annotation-Tool How to Run this Tool? To run this software, follow the steps: git clone https://github.com/Autonomous-Car-Project/Data-Annotation

TiVRA AI 13 Aug 18, 2022