Simple Baselines for Human Pose Estimation and Tracking

Overview

Simple Baselines for Human Pose Estimation and Tracking

News

Introduction

This is an official pytorch implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best single model achieves 74.3 of mAP. You can reproduce our results using this repo. All models are provided for research purpose.

Main Results

Results on MPII val

Arch Head Shoulder Elbow Wrist Hip Knee Ankle Mean [email protected]
256x256_pose_resnet_50_d256d256d256 96.351 95.329 88.989 83.176 88.420 83.960 79.594 88.532 33.911
384x384_pose_resnet_50_d256d256d256 96.658 95.754 89.790 84.614 88.523 84.666 79.287 89.066 38.046
256x256_pose_resnet_101_d256d256d256 96.862 95.873 89.518 84.376 88.437 84.486 80.703 89.131 34.020
384x384_pose_resnet_101_d256d256d256 96.965 95.907 90.268 85.780 89.597 85.935 82.098 90.003 38.860
256x256_pose_resnet_152_d256d256d256 97.033 95.941 90.046 84.976 89.164 85.311 81.271 89.620 35.025
384x384_pose_resnet_152_d256d256d256 96.794 95.618 90.080 86.225 89.700 86.862 82.853 90.200 39.433

Note:

  • Flip test is used.

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_d256d256d256 0.704 0.886 0.783 0.671 0.772 0.763 0.929 0.834 0.721 0.824
384x288_pose_resnet_50_d256d256d256 0.722 0.893 0.789 0.681 0.797 0.776 0.932 0.838 0.728 0.846
256x192_pose_resnet_101_d256d256d256 0.714 0.893 0.793 0.681 0.781 0.771 0.934 0.840 0.730 0.832
384x288_pose_resnet_101_d256d256d256 0.736 0.896 0.803 0.699 0.811 0.791 0.936 0.851 0.745 0.858
256x192_pose_resnet_152_d256d256d256 0.720 0.893 0.798 0.687 0.789 0.778 0.934 0.846 0.736 0.839
384x288_pose_resnet_152_d256d256d256 0.743 0.896 0.811 0.705 0.816 0.797 0.937 0.858 0.751 0.863

Results on Caffe-style ResNet

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_caffe_d256d256d256 0.704 0.914 0.782 0.677 0.744 0.735 0.921 0.805 0.704 0.783
256x192_pose_resnet_101_caffe_d256d256d256 0.720 0.915 0.803 0.693 0.764 0.753 0.928 0.821 0.720 0.802
256x192_pose_resnet_152_caffe_d256d256d256 0.728 0.925 0.804 0.702 0.766 0.760 0.931 0.828 0.729 0.806

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • Difference between PyTorch-style and Caffe-style ResNet is the position of stride=2 convolution

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

  1. Install pytorch >= v0.4.0 following official instruction.

  2. Disable cudnn for batch_norm:

    # PYTORCH=/path/to/pytorch
    # for pytorch v0.4.0
    sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    # for pytorch v0.4.1
    sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    

    Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.

  3. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  4. Install dependencies:

    pip install -r requirements.txt
    
  5. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  6. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  7. Download pytorch imagenet pretrained models from pytorch model zoo and caffe-style pretrained models from GoogleDrive.

  8. Download mpii and coco pretrained models from OneDrive or GoogleDrive. Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet50-caffe.pth.tar
             |   |-- resnet101-5d3b4d8f.pth
             |   |-- resnet101-caffe.pth.tar
             |   |-- resnet152-b121ed2d.pth
             |   `-- resnet152-caffe.pth.tar
             |-- pose_coco
             |   |-- pose_resnet_101_256x192.pth.tar
             |   |-- pose_resnet_101_384x288.pth.tar
             |   |-- pose_resnet_152_256x192.pth.tar
             |   |-- pose_resnet_152_384x288.pth.tar
             |   |-- pose_resnet_50_256x192.pth.tar
             |   `-- pose_resnet_50_384x288.pth.tar
             `-- pose_mpii
                 |-- pose_resnet_101_256x256.pth.tar
                 |-- pose_resnet_101_384x384.pth.tar
                 |-- pose_resnet_152_256x256.pth.tar
                 |-- pose_resnet_152_384x384.pth.tar
                 |-- pose_resnet_50_256x256.pth.tar
                 `-- pose_resnet_50_384x384.pth.tar
    
    
  9. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── pose_estimation
    ├── README.md
    └── requirements.txt
    

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Valid on MPII using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Training on MPII

python pose_estimation/train.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml

Valid on COCO val2017 using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar

Training on COCO train2017

python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

Other Implementations

Citation

If you use our code or models in your research, please cite with:

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

EmotionUI Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI. demo screenshot (with RealSense) required packages Python = 3.6 num

Yang Jiao 2 Dec 23, 2021
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

Pengwei Zhou 183 Dec 01, 2022
Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB This repository is the official implementation of the paper. Our results comming soon in

Xiaoqiang Wang 8 May 22, 2022
The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

TempleX 9 Jul 30, 2022
Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Camera Condition Monitoring and Readjustment by means of Noise and Blur This repository contains the source code of the paper: Wischow, M., Gallego, G

7 Dec 22, 2022
Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"

GPT-GNN: Generative Pre-Training of Graph Neural Networks GPT-GNN is a pre-training framework to initialize GNNs by generative pre-training. It can be

Ziniu Hu 346 Dec 19, 2022
Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

DeepGCNs: Can GCNs Go as Deep as CNNs? In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly re

Guohao Li 612 Nov 15, 2022
Fastshap: A fast, approximate shap kernel

fastshap: A fast, approximate shap kernel fastshap was designed to be: Fast Calculating shap values can take an extremely long time. fastshap utilizes

Samuel Wilson 22 Sep 24, 2022
Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

NeuralSymbolicRegressionThatScales Pytorch implementation and pretrained models for the paper "Neural Symbolic Regression That Scales", presented at I

35 Nov 25, 2022
Research code for Arxiv paper "Camera Motion Agnostic 3D Human Pose Estimation"

GMR(Camera Motion Agnostic 3D Human Pose Estimation) This repo provides the source code of our arXiv paper: Seong Hyun Kim, Sunwon Jeong, Sungbum Park

Seong Hyun Kim 1 Feb 07, 2022
A Python framework for conversational search

Chatty Goose Multi-stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting Installation Ma

Castorini 36 Oct 23, 2022
DeLiGAN - This project is an implementation of the Generative Adversarial Network

This project is an implementation of the Generative Adversarial Network proposed in our CVPR 2017 paper - DeLiGAN : Generative Adversarial Net

Video Analytics Lab -- IISc 110 Sep 13, 2022
GUI for TOAD-GAN, a PCG-ML algorithm for Token-based Super Mario Bros. Levels.

If you are using this code in your own project, please cite our paper: @inproceedings{awiszus2020toadgan, title={TOAD-GAN: Coherent Style Level Gene

Maren A. 13 Dec 14, 2022
Session-based Recommendation, CoHHN, price preferences, interest preferences, Heterogeneous Hypergraph, Co-guided Learning, SIGIR2022

This is our implementation for the paper: Price DOES Matter! Modeling Price and Interest Preferences in Session-based Recommendation Xiaokun Zhang, Bo

Xiaokun Zhang 27 Dec 02, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 01, 2023
Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

Shufflenet-v2-Pytorch Introduction This is a Pytorch implementation of faceplusplus's ShuffleNet-v2. For details, please read the following papers:

423 Dec 07, 2022
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment 100 Dec 19, 2022
High performance distributed framework for training deep learning recommendation models based on PyTorch.

High performance distributed framework for training deep learning recommendation models based on PyTorch.

340 Dec 30, 2022
Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

ProMo (Prosody Morph) Questions? Comments? Feedback? Chat with us on gitter! A library for manipulating pitch and duration in an algorithmic way, for

Tim 71 Jan 02, 2023
Material related to the Principles of Cloud Computing course.

CloudComputingCourse Material related to the Principles of Cloud Computing course. This repository comprises material that I use to teach my Principle

Aniruddha Gokhale 15 Dec 02, 2022