A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Overview

Movenet.Pytorch

license

Intro

start

MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Google just release pre-train models(tfjs or tflite), which cannot be converted to some CPU inference framework such as NCNN,Tengine,MNN,TNN, and we can not add our own custom data to finetune, so there is this repo.

How To Run

1.Download COCO dataset2017 from https://cocodataset.org/. (You need train2017.zip, val2017.zip and annotations.)Unzip to movenet.pytorch/data/ like this:

├── data
    ├── annotations (person_keypoints_train2017.json, person_keypoints_val2017.json, ...)
    ├── train2017   (xx.jpg, xx.jpg,...)
    └── val2017     (xx.jpg, xx.jpg,...)

2.Make data to our data format.

python scripts/make_coco_data_17keypooints.py
Our data format: JSON file
Keypoints order:['nose', 'left_eye', 'right_eye', 'left_ear', 'right_ear', 
    'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow', 'left_wrist', 
    'right_wrist', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'left_ankle', 
    'right_ankle']

One item:
[{"img_name": "0.jpg",
  "keypoints": [x0,y0,z0,x1,y1,z1,...],
  #z: 0 for no label, 1 for labeled but invisible, 2 for labeled and visible
  "center": [x,y],
  "bbox":[x0,y0,x1,y1],
  "other_centers": [[x0,y0],[x1,y1],...],
  "other_keypoints": [[[x0,y0],[x1,y1],...],[[x0,y0],[x1,y1],...],...], #lenth = num_keypoints
 },
 ...
]

3.You can add your own data to the same format.

4.After putting data at right place, you can start training

python train.py

5.After training finished, you need to change the test model path to test. Such as this in predict.py

run_task.modelLoad("output/xxx.pth")

6.run predict to show predict result, or run evaluate.py to compute my acc on test dataset.

python predict.py

7.Convert to onnx.

python pth2onnx.py

Training Results

Some good samples

good

Some bad cases

bad

Tips to improve

1. Focus on data

  • Add COCO2014. (But as I know it has some duplicate data of COCO2017, and I don't know if google use it.)
  • Clean the croped COCO2017 data. (Some img just have little points, such as big face, big body,etc.MoveNet is a small network, COCO data is a little hard for it.)
  • Add some yoga, fitness, and dance videos frame from YouTube. (Highly Recommened! Cause Google did this on their Movenet and said 'Evaluations on the Active validation dataset show a significant performance boost relative to identical architectures trained using only COCO. ')

2. Change backbone

Try to ransfer Mobilenetv2(original Movenet) to Mobilenetv3 or Shufflenetv2 may get a litte improvement.If you just wanna reproduce the original Movenet, u can ignore this.

3. More fancy loss

Surely this is a muti-task learning. So add some loss to learn together may improve the performence. (Such as BoneLoss which I have added.) And we can never know how Google trained, cause we cannot see it from the pre-train tflite model file, so you can try any loss function you like.

4. Data Again

I just wanna you know the importance of the data. The more time you spend on clean data and add new data, the better performance your model will get! (While tips 2 and 3 may not.)

Resource

  1. Blog:Next-Generation Pose Detection with MoveNet and TensorFlow.js
  2. model card
  3. TFHub:movenet/singlepose/lightning
  4. My article share: 2021轻量级人体姿态估计模型修炼之路(附谷歌MoveNet复现经验)
Owner
Mr.Fire
Mr.Fire
Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

Mohamed Hassan 81 Nov 08, 2022
VOGUE: Try-On by StyleGAN Interpolation Optimization

VOGUE is a StyleGAN interpolation optimization algorithm for photo-realistic try-on. Top: shirt try-on automatically synthesized by our method in two different examples.

Wei ZHANG 66 Dec 09, 2022
Codes for Causal Semantic Generative model (CSG), the model proposed in "Learning Causal Semantic Representation for Out-of-Distribution Prediction" (NeurIPS-21)

Learning Causal Semantic Representation for Out-of-Distribution Prediction This repository is the official implementation of "Learning Causal Semantic

Chang Liu 54 Dec 01, 2022
Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

About This repository the official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning. The config files contain the s

Dynamic Vision and Learning Group 41 Dec 10, 2022
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Introduction Getting Started FSD50K Recipe AudioSet Recipe Label E

Yuan Gong 84 Dec 27, 2022
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

CVSM Group - email: <a href=[email protected]"> 377 Jan 07, 2023
Constrained Language Models Yield Few-Shot Semantic Parsers

Constrained Language Models Yield Few-Shot Semantic Parsers This repository contains tools and instructions for reproducing the experiments in the pap

Microsoft 43 Nov 23, 2022
CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

In the light of feature distributions: Moment matching for Neural Style Transfer (CVPR 2021) This repository provides code to recreate results present

Nikolai Kalischek 49 Oct 13, 2022
TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

TensorFlow GNN This is an early (alpha) release to get community feedback. It's under active development and we may break API compatibility in the fut

889 Dec 30, 2022
This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

Vegetabird 241 Jan 07, 2023
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

1.1k Jan 01, 2023
A Python implementation of global optimization with gaussian processes.

Bayesian Optimization Pure Python implementation of bayesian global optimization with gaussian processes. PyPI (pip): $ pip install bayesian-optimizat

fernando 6.5k Jan 02, 2023
Self-attentive task GAN for space domain awareness data augmentation.

SATGAN TODO: update the article URL once published. Article about this implemention The self-attentive task generative adversarial network (SATGAN) le

Nathan 2 Mar 24, 2022
Hyperparameter Optimization for TensorFlow, Keras and PyTorch

Hyperparameter Optimization for Keras Talos • Key Features • Examples • Install • Support • Docs • Issues • License • Download Talos radically changes

Autonomio 1.6k Dec 15, 2022
My implementation of Fully Convolutional Neural Networks in Keras

Keras-FCN This repository contains my implementation of Fully Convolutional Networks in Keras (Tensorflow backend). Currently, semantic segmentation c

The Duy Nguyen 15 Jan 13, 2020
Parameter Efficient Deep Probabilistic Forecasting

PEDPF Parameter Efficient Deep Probabilistic Forecasting (PEDPF) is a repository containing code to run experiments for several deep learning based pr

Olivier Sprangers 10 Jun 13, 2022
Plenoxels: Radiance Fields without Neural Networks

Plenoxels: Radiance Fields without Neural Networks Alex Yu*, Sara Fridovich-Keil*, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa UC Be

Sara Fridovich-Keil 81 Dec 25, 2022
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Dec 29, 2022
Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022
WatermarkRemoval-WDNet-WACV2021

WatermarkRemoval-WDNet-WACV2021 Thank you for your attention. Citation Please cite the related works in your publications if it helps your research: @

LUYI 63 Dec 05, 2022