A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Last update: Jan 07, 2023

Overview

Minimal Hand

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

This project provides the core components for hand motion capture:

estimating joint locations from a monocular RGB image (DetNet)
estimating joint rotations from locations (IKNet)

We focus on:

ease of use (all you need is a webcam)
time efficiency (on our 1080Ti, 8.9ms for DetNet, 0.9ms for IKNet)
robustness to occlusion, hand-object interaction, fast motion, changing scale and view point

Some links: [video] [paper] [supp doc] [webpage]

The author is too busy to collect the training code for release. On the other hand, it should not be difficult to implement the training part. Feel free to open an issue for any encountered problems.

Pytorch Version

Here is a pytorch version implemented by @MengHao666. I didn't personally check it but I believe it worth trying. Many thanks to @MengHao666 !

With Unity

Here is a project that connects this repo to unity. It looks very cool and many thanks to @vinnik-dmitry07 !

Usage

Install dependencies

Please check requirements.txt. All dependencies are available via pip and conda.

Prepare MANO hand model

Download MANO model from here and unzip it.
In config.py, set OFFICIAL_MANO_PATH to the left hand model.
Run python prepare_mano.py, you will get the converted MANO model that is compatible with this project at config.HAND_MESH_MODEL_PATH.

Prepare pre-trained network models

Download models from here.
Put detnet.ckpt.* in model/detnet, and iknet.ckpt.* in model/iknet.
Check config.py, make sure all required files are there.

Run the demo for webcam input

python app.py
Put your right hand in front of the camera. The pre-trained model is for left hand, but the input would be flipped internally.
Press ESC to quit.
Although the model is robust to variant scales, most ideally the image should be 1.3x larger than the hand bounding box. A good bounding box may result in better accuracy. You can track the bounding box with the 2D predictions of the model.

We found that the model may fail on some "simple" poses. We think this is because such poses were no presented in the training data. We are working on a v2 version with further extended data to tackle this problem.

Use the models in your project

Please check wrappers.py.

IKNet Alternative

We also provide an optimization-based IK solver here.

Dataset

The detection model is trained with following datasets:

The IK model is trained with the poses shipped with MANO.

Citation

This is the official implementation of the paper "Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data" (CVPR 2020).

The quantitative numbers reported in the paper can be found in plot.py.

If you find the project helpful, please consider citing us:

@inproceedings{zhou2020monocular,
  title={Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data},
  author={Zhou, Yuxiao and Habermann, Marc and Xu, Weipeng and Habibie, Ikhsanul and Theobalt, Christian and Xu, Feng},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={0--0},
  year={2020}
}

Comments

About SMPL(mosh) label .

Hello, Ask a question again. There is no mosh(SMPL theta and beta) in STB、RHD、FreiHand dataset etc. How to translate 3D keypoints to mesh(SMPL theta beta)? Hope your reply, thanks.

opened by www516717402 13
how to use the Right-hand model

In config.py, I set OFFICIAL_MANO_PATH to the right hand model and Run python prepare_mano.py.Then I can get the converted MANO model about right hand. But when I use the converted MANO model about right hand, the result is so bad. Where are the wrongs? Thus, I want to known how to use the MANO model about right hand OR how to convert the MANO model about right hand? looking forward to your reply! Thanks a lot.

opened by huangfuts 12
Questions about training IKNet
Thank you for great project，I have a few questions about training IKNet

When changing the original 16 rotations of MANO into 21 rotations, do W, T0, I0, M0, R0, and L0 share the rotation of W in the original MANO?

I found the joints_xyz calculated by using MANO ref_pose and the transformed 21 rotation parameters using the method in hand_mesh.py is not equal to the 'J_transformed' saved in the MANO pkl file , the order of joints has been adjusted according to kinematics.py. When using the MANO dataset to train IKNet, how did you get the ground truth 3D joint annotation in Lxyz? Is the calculation method of FK (Q) the same as the calculation method of joint_xyz in hand_mesh.py
opened by Gel-smile 9
How to mix and train the different datasets?

Paper say that: DetNet is trained on 3 datasets: theCMU Panoptic Dataset (CMU) , the Rendered Hand-pose Dataset (RHD) and the GANerated Hands Dataset(GAN).

Since the images of three datasets are different from each other, can u please tell me how to preprocess the image?

opened by LyazS 8
How to get beta in IKNet?

You have done really a great work!

When I read your paper about, I am a little confused about how to find the best beta in IKNet by minimizing E(beta). Is beta directly got by solving the function? Or using ML methods like Newton down-hill method?

Thank you Best wishes

opened by Mrsirovo 6
delta = delta * length为什么要进行相乘操作呢？

你好，在wrappers.py文件166行，使用delta = delta * length 作为ikmodel的输入之一方便问一下这么做的原因吗？ delta是关节归一化向量，乘以关节长度是为什么呢？

我理解输入到后面ik模型的参数需要有手部mesh模板参数，手部姿态参数sita（delta），skinning weights，手部坐标参数（xyz）

所以就比较困惑只输入delta就可以了，为什么要相乘一下length。。。。

opened by tonylin52 5
how to do "global alignment"?

Hi,I got confused about another problem.

In your paper ,u said "As previous work, we perform a global alignment to better measure the local hand pose. " How do u implement the "global alignment"? Is it just to transalate the root joint to same location of label (Is the lable here is also root-relative and normalized using reference bone?) I got AUC of 0.1 only using DetNet retrained in RHD.

Could u provide the "prevous work" that do a global alignment like u? And it would bebetter if their code has been public available. Thanks!

opened by MengHao666 5
How can I use the model output quaternion to unity?

Thank you for your great work! I'm trying to use the model output to animate a virtual hand in unity, I tried to set the quaternion into unity's localrotation but it did not work. Could you share some insight about how I can achieve that?

opened by wangtss 5
IK using 3D joint coordinates

Hello, First, I would like to congrats you on the amazing paper. Moreover, I have a question regarding IK architecture. I would like to know if there is any comparison between the IK architecture that you proposed here and the other algorithm that you previously proposed based on Levenberg-Marquadr on Mano's hand. Additionally, could you guide me on applying the IK architecture without running the entire code as I have some ground truth 3d coordinates, and I want to obtain the IK parameters? Thanks a lot.

opened by Amebradi 5
Obtaining MoCAP from a two hand video dataset

Greetings and many thanks for the great work.

I wanted to utilize your code to extract MoCAP data given a first person RGB video dataset that has a clear view of both hands during a task. Given that your model is restricted to predicting from a single hand I wonder whether it will consistently show preference for the left hand if presented with videos that display both? If that's the case I suppose I could parse the dataset twice, flipping it the second time to obtain both hands' coordinates, right?

opened by Linardos 5
Any plans on evaluating on FreiHAND dataset?

I'm curious as it seems to be one of the better datasets publicly available, not only does it include really accurate 3D poses, but they are all on real images include challenging poses and object interactions. Along with all of this, it includes MANO hand shape ground truths. I would love to see how this model performs.

It also allows for seeing how this performs without needing alignment since both camera intrinsics and scale are included for each image

I'm also curious if this would be a good alternative for training IKNet instead of the MoCap data since it includes the hand shape ground truths. I'm not sure if I should open a separate issue for that to make it easier for others to find

opened by pablovela5620 5
Keypoint representation as input to IKNet
I am trying to use IKNet separately, starting from hand keypoints that have been extracted with MediaPipe. In order for this to work, I need to make sure that the Mediapipe hand coordinates are preprocessed in order to match the expected input format of IKNet (origin, scale, possibly rotation as well??).

I ran into two questions here:

I can see from your code that the keypoints have te be shifted to make 'M1' the origin. Bust what is the assumed scale? In the code you use IK_UNIT_LENGTH when rescaling from Mano reference keypoints, but it is not clear what this relates to or where it comes from. Also, is there an assumption on rotation of the hand (e.g. palm orientation)?

I was assuming that the 'mpii_ref' keypoint set you pass as input to the IKNet would be some kind of "relaxed" reference hand (this is converted from the mano code base). When I plot it, however, the projection onto the xz plane matches this assumption, but the y coordinates look very strange, so I am assuming I am doing something wrong in interpreting this. Or maybe this incorporates some assumptions about the IKNet model input that I need to convert also to xyz keypoints input - since this seems to be passed as a reference hand? Could you clarify?

Examples: (1) mpii_ref hand in front view (looking fine) (2) mpii_ref hand in rotated xyz view, showing unnaturally curved fingers and very long wrist-to-thumb connection (3) For comparison: mediapipe hand in front view (4) For comparison: mediapipe hand in same xyz view as above
opened by jdambre 1

Project dependencies may have API risk issues

Hi, In minimal-hand, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

pygame==1.9.4
open3d==0.9
tensorflow_gpu==1.14.0
transforms3d==0.3.1
keyboard==0.13.4
opencv_python==3.4.3.18
numpy==1.18.1

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project, The version constraint of dependency keyboard can be changed to >=0.9.3,<=0.13.5. The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3.

The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the keyboard

keyboard.is_pressed

The calling methods from the numpy

numpy.linalg.norm

The calling methods from the all methods

pygame.init
open3d.visualization.Visualizer.update_renderer
tensorflow.pad
pickle.load
open3d.visualization.Visualizer.update_geometry
tensorflow.layers.dense
zero_padding
detnet
open3d.geometry.TriangleMesh
wrappers.ModelPipeline
self.ik_model.process
tf_hmap_to_uv
pygame.display.set_mode.blit
open3d.visualization.Visualizer.create_window
tensorflow.nn.relu
cv2.VideoCapture
pickle.load.toarray
load_pkl
lmaps.append
numpy.maximum
tensorflow.ConfigProto
dmaps.append
tensorflow.norm
tensorflow.reshape
tensorflow.contrib.layers.xavier_initializer
self.cap.read
matplotlib.pyplot.show
tensorflow.cast
numpy.matmul
dense
open3d.geometry.TriangleMesh.compute_triangle_normals
tensorflow.layers.batch_normalization
numpy.sum
viewer.get_view_control.set_constant_z_far
capture.read
transforms3d.quaternions.quat2mat
pygame.time.Clock
viewer.get_view_control.convert_to_pinhole_camera_parameters
numpy.expand_dims
tensorflow.contrib.layers.l2_regularizer
tensorflow.concat.get_shape
str
utils.OneEuroFilter
xyz_to_delta
self.compute_alpha
self.dx_filter.process
open3d.geometry.TriangleMesh.compute_vertex_normals
matplotlib.pyplot.plot
pickle.dump
conv_bn
pygame.time.Clock.tick
tensorflow.expand_dims
features.get_shape.as_list
tensorflow.nn.max_pool2d
keyboard.is_pressed
tensorflow.name_scope
frame_large.np.flip.copy
data.items
numpy.abs
net_2d
open3d.visualization.Visualizer
len
utils.OneEuroFilter.process
dense_bn
matplotlib.pyplot.legend
tensorflow.gather_nd
tensorflow.argmax
LowPassFilter
tensorflow.train.Saver
pygame.surfarray.make_surface
tensorflow.stack
numpy.linalg.norm
MANOHandJoints.labels.index
viewer.get_view_control.convert_from_pinhole_camera_parameters
tensorflow.nn.sigmoid
matplotlib.pyplot.xlabel
plot_pck
inputs.get_shape
calculate_auc
numpy.linspace.reshape
pygame.display.update
tensorflow.train.Saver.restore
tensorflow.concat
open3d.utility.Vector3dVector
bottleneck
open3d.visualization.Visualizer.poll_events
self.det_model.process
tensorflow.initializers.truncated_normal
open3d.visualization.Visualizer.get_view_control
numpy.transpose
int
xyz.get_shape.as_list
hand_mesh.HandMesh.set_abs_quat
numpy.tile
cam_params.intrinsic.set_intrinsics
self.ref_T.append
open3d.utility.Vector3iVector
numpy.array
viewer.get_render_option.load_from_json
self.graph.as_default
open3d.visualization.Visualizer.get_render_option
tensorflow.shape
get_pose_tile
mano_to_mpii
xyz.get_shape
tensorflow.tile
ModelIK
tensorflow.layers.conv2d
numpy.stack
tensorflow.transpose
tensorflow.Session
frame.np.flip.copy
pygame.display.set_mode
MANOHandJoints.mesh_mapping.items
cv2.resize
open3d.geometry.TriangleMesh.paint_uniform_color
transforms3d.axangles.axangle2mat
hmaps.append
net_3d
range
pygame.display.set_caption
hand_mesh.HandMesh
MPIIHandJoints.labels.index
self.verts.copy
self.x_filter.process
tensorflow.Graph
ModelDet
numpy.linspace
wrappers.ModelPipeline.process
matplotlib.pyplot.grid
tensorflow.variable_scope
numpy.concatenate
tensorflow.constant
tensorflow.maximum
self.ref_pose.append
conv_bn_relu
capture.OpenCVCapture
live_application
matplotlib.pyplot.ylabel
matplotlib.pyplot.tight_layout
open3d.visualization.Visualizer.add_geometry
kinematics.mpii_to_mano
utils.imresize
tensorflow.placeholder
cam_params.extrinsic.copy
numpy.stack.append
self.sess.run
resnet50
open
numpy.flip
tensorflow.where
prepare_mano
numpy.finfo
network_fn
numpy.zeros
inputs.get_shape.as_list

@developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

opened by PyDeps 0

关于生成手部的shape

您好，非常感谢您能分享您的成果。我在阅读您的代码和尝试的时候有几个问题想请教下 1.在代码里没有评估beta参数，所以生成的手部并不能保持原来手的形状对吗，比如手指长短，比例，厚度 2.生成的手部模型的大小是固定的，不会因为图片中手的大小而变化，是这样吗 3.代码的输入是视频如果只输入一张图片是否会让结果变差呢

opened by ChaoYingYu 0

Releases(v1)

v1(Nov 19, 2021)

v1 release.
Source code(tar.gz)
Source code(zip)
cvpr_2020_hand_model_v1.zip(196.06 MB)

Owner

Yuxiao Zhou

Good luck, have fun.

GitHub Repository

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

41 Dec 09, 2022

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Polypharmacy - DDI - Synergy Survey The Survey Paper This repository accompanies our survey paper A Unified View of Relational Deep Learning for Polyp

79 Jan 05, 2023

Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

Voxel-based Network for Shape Completion by Leveraging Edge Generation This is the PyTorch implementation for the paper "Voxel-based Network for Shape

10 Dec 04, 2022

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

224 Jan 04, 2023

Audio Visual Emotion Recognition using TDA

Audio Visual Emotion Recognition using TDA RAVDESS database with two datasets analyzed: Video and Audio dataset: Audio-Dataset: https://www.kaggle.com

3 May 11, 2022

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)- Emirhan BULUT

102 Nov 18, 2022

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding This repo contains the data and source code for baseline models in the NeurIPS 2

29 Dec 29, 2022

Text-Based Ideal Points

Text-Based Ideal Points Source code for the paper: Text-Based Ideal Points by Keyon Vafa, Suresh Naidu, and David Blei (ACL 2020). Update (June 29, 20

37 Oct 09, 2022

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022

Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2022/01/05 By another round of training based on previous weights, our model also achieved a better performance on ACDC (91.61% DSC). W

92 Dec 25, 2022

This repository contains the implementation of the paper: Federated Distillation of Natural Language Understanding with Confident Sinkhorns

Federated Distillation of Natural Language Understanding with Confident Sinkhorns This repository provides an alternative method for ensembled distill

11 Nov 16, 2022

Code for the published paper : Learning to recognize rare traffic sign

Improving traffic sign recognition by active search This repo contains code for the paper : "Learning to recognise rare traffic signs" How to use this

4 Jan 05, 2023

This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Repository for the paper "Graph Auto-Encoders for Financial Clustering" Requirements Python 3.6 torch torch_geometric Instructions This is a simple c

1 Dec 02, 2021

Coded illumination for improved lensless imaging

CodedCam Coded Illumination for Improved Lensless Imaging Paper | Supplementary results | Data and Code are available. Coded illumination for improved

1 Nov 29, 2021

A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

MADGRAD Optimization Algorithm For Tensorflow This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized,

20 Aug 18, 2022

Plover-tapey-tape: an alternative to Plover’s built-in paper tape

plover-tapey-tape plover-tapey-tape is an alternative to Plover’s built-in paper

7 May 29, 2022

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

73 Dec 16, 2022

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

40 Dec 30, 2022

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression This repository contains the code for the paper in EM

2 Mar 24, 2022

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks This is our implementation for the paper: FinGAT: A Financial Graph At

64 Dec 13, 2022

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Related tags

Overview

Minimal Hand

Pytorch Version

With Unity

Usage

Install dependencies

Prepare MANO hand model

Prepare pre-trained network models

Run the demo for webcam input

Use the models in your project

IKNet Alternative

Dataset

Citation

Comments

Releases(v1)

v1(Nov 19, 2021)

Owner

Yuxiao Zhou

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

Audio Visual Emotion Recognition using TDA

Cryptocurrency Prediction with Artificial Intelligence (Deep Learning via LSTM Neural Networks)

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Text-Based Ideal Points

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Mixed Transformer UNet for Medical Image Segmentation

This repository contains the implementation of the paper: Federated Distillation of Natural Language Understanding with Confident Sinkhorns

Code for the published paper : Learning to recognize rare traffic sign

This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Coded illumination for improved lensless imaging

A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

Plover-tapey-tape: an alternative to Plover’s built-in paper tape

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks