Simple Baselines for Human Pose Estimation and Tracking

Overview

Simple Baselines for Human Pose Estimation and Tracking

News

Introduction

This is an official pytorch implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best single model achieves 74.3 of mAP. You can reproduce our results using this repo. All models are provided for research purpose.

Main Results

Results on MPII val

Arch Head Shoulder Elbow Wrist Hip Knee Ankle Mean [email protected]
256x256_pose_resnet_50_d256d256d256 96.351 95.329 88.989 83.176 88.420 83.960 79.594 88.532 33.911
384x384_pose_resnet_50_d256d256d256 96.658 95.754 89.790 84.614 88.523 84.666 79.287 89.066 38.046
256x256_pose_resnet_101_d256d256d256 96.862 95.873 89.518 84.376 88.437 84.486 80.703 89.131 34.020
384x384_pose_resnet_101_d256d256d256 96.965 95.907 90.268 85.780 89.597 85.935 82.098 90.003 38.860
256x256_pose_resnet_152_d256d256d256 97.033 95.941 90.046 84.976 89.164 85.311 81.271 89.620 35.025
384x384_pose_resnet_152_d256d256d256 96.794 95.618 90.080 86.225 89.700 86.862 82.853 90.200 39.433

Note:

  • Flip test is used.

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_d256d256d256 0.704 0.886 0.783 0.671 0.772 0.763 0.929 0.834 0.721 0.824
384x288_pose_resnet_50_d256d256d256 0.722 0.893 0.789 0.681 0.797 0.776 0.932 0.838 0.728 0.846
256x192_pose_resnet_101_d256d256d256 0.714 0.893 0.793 0.681 0.781 0.771 0.934 0.840 0.730 0.832
384x288_pose_resnet_101_d256d256d256 0.736 0.896 0.803 0.699 0.811 0.791 0.936 0.851 0.745 0.858
256x192_pose_resnet_152_d256d256d256 0.720 0.893 0.798 0.687 0.789 0.778 0.934 0.846 0.736 0.839
384x288_pose_resnet_152_d256d256d256 0.743 0.896 0.811 0.705 0.816 0.797 0.937 0.858 0.751 0.863

Results on Caffe-style ResNet

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
256x192_pose_resnet_50_caffe_d256d256d256 0.704 0.914 0.782 0.677 0.744 0.735 0.921 0.805 0.704 0.783
256x192_pose_resnet_101_caffe_d256d256d256 0.720 0.915 0.803 0.693 0.764 0.753 0.928 0.821 0.720 0.802
256x192_pose_resnet_152_caffe_d256d256d256 0.728 0.925 0.804 0.702 0.766 0.760 0.931 0.828 0.729 0.806

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • Difference between PyTorch-style and Caffe-style ResNet is the position of stride=2 convolution

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

  1. Install pytorch >= v0.4.0 following official instruction.

  2. Disable cudnn for batch_norm:

    # PYTORCH=/path/to/pytorch
    # for pytorch v0.4.0
    sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    # for pytorch v0.4.1
    sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
    

    Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.

  3. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  4. Install dependencies:

    pip install -r requirements.txt
    
  5. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  6. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  7. Download pytorch imagenet pretrained models from pytorch model zoo and caffe-style pretrained models from GoogleDrive.

  8. Download mpii and coco pretrained models from OneDrive or GoogleDrive. Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet50-caffe.pth.tar
             |   |-- resnet101-5d3b4d8f.pth
             |   |-- resnet101-caffe.pth.tar
             |   |-- resnet152-b121ed2d.pth
             |   `-- resnet152-caffe.pth.tar
             |-- pose_coco
             |   |-- pose_resnet_101_256x192.pth.tar
             |   |-- pose_resnet_101_384x288.pth.tar
             |   |-- pose_resnet_152_256x192.pth.tar
             |   |-- pose_resnet_152_384x288.pth.tar
             |   |-- pose_resnet_50_256x192.pth.tar
             |   `-- pose_resnet_50_384x288.pth.tar
             `-- pose_mpii
                 |-- pose_resnet_101_256x256.pth.tar
                 |-- pose_resnet_101_384x384.pth.tar
                 |-- pose_resnet_152_256x256.pth.tar
                 |-- pose_resnet_152_384x384.pth.tar
                 |-- pose_resnet_50_256x256.pth.tar
                 `-- pose_resnet_50_384x384.pth.tar
    
    
  9. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── pose_estimation
    ├── README.md
    └── requirements.txt
    

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Valid on MPII using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Training on MPII

python pose_estimation/train.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml

Valid on COCO val2017 using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar

Training on COCO train2017

python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

Other Implementations

Citation

If you use our code or models in your research, please cite with:

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos Carreño 108 Dec 27, 2022
Implementation of Uformer, Attention-based Unet, in Pytorch

Uformer - Pytorch Implementation of Uformer, Attention-based Unet, in Pytorch. It will only offer the concat-cross-skip connection. This repository wi

Phil Wang 72 Dec 19, 2022
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Equivariant Subgraph Aggregation Networks (ESAN) This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (IC

Beatrice Bevilacqua 59 Dec 13, 2022
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Super Resolution Examples We run this script under TensorFlow 2.0 and the TensorLayer2.0+. For TensorLayer 1.4 version, please check release. 🚀 🚀 🚀

TensorLayer Community 2.9k Jan 08, 2023
An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
3D-Reconstruction 基于深度学习方法的单目多视图三维重建

基于深度学习方法的单目多视图三维重建 Part I 三维重建 代码:Part1 技术文档:[Markdown] [PDF] 原始图像:Original Images 点云结果:Point Cloud Results-1

HMT_Curo 19 Dec 26, 2022
The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

CSGStumpNet The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing Paper | Project page

Daxuan 39 Dec 26, 2022
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

PyBx WIP A simple python module to generate anchor (aka default/prior) boxes for object detection tasks. Calculated anchor boxes are returned as ndarr

thatgeeman 4 Dec 15, 2022
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
Neural Architecture Search Powered by Swarm Intelligence 🐜

Neural Architecture Search Powered by Swarm Intelligence 🐜 DeepSwarm DeepSwarm is an open-source library which uses Ant Colony Optimization to tackle

288 Oct 28, 2022
Where2Act: From Pixels to Actions for Articulated 3D Objects

Where2Act: From Pixels to Actions for Articulated 3D Objects The Proposed Where2Act Task. Given as input an articulated 3D object, we learn to propose

Kaichun Mo 69 Nov 28, 2022
Medical Insurance Cost Prediction using Machine earning

Medical-Insurance-Cost-Prediction-using-Machine-learning - Here in this project, I will use regression analysis to predict medical insurance cost for people in different regions, and based on several

1 Dec 27, 2021
Point Cloud Registration using Representative Overlapping Points.

Point Cloud Registration using Representative Overlapping Points (ROPNet) Abstract 3D point cloud registration is a fundamental task in robotics and c

ZhuLifa 36 Dec 16, 2022
Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

Exploring the Limits of Out-of-Distribution Detection In this repository we're collecting replications for the key experiments in the Exploring the Li

Stanislav Fort 35 Jan 03, 2023
[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

SSH: A Self-Supervised Framework for Image Harmonization (ICCV 2021) code for SSH Representative Examples Main Pipeline RealHM DataSet Google Drive Pr

VITA 86 Dec 02, 2022
This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

Code Repository for the Paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (To appear in: Proceedings of NeurIPS20

1 Oct 03, 2022