KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

Overview

KAPAO (Keypoints and Poses as Objects)

KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as objects within a dense anchor-based detection framework. When not using test-time augmentation (TTA), KAPAO is much faster and more accurate than previous single-stage methods like DEKR and HigherHRNet:

alt text

This repository contains the official PyTorch implementation for the paper:
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation.

Our code was forked from ultralytics/yolov5 at commit 5487451.

Setup

  1. If you haven't already, install Anaconda or Miniconda.
  2. Create a new conda environment with Python 3.6: $ conda create -n kapao python=3.6.
  3. Activate the environment: $ conda activate kapao
  4. Clone this repo: $ git clone https://github.com/wmcnally/kapao.git
  5. Install the dependencies: $ cd kapao && pip install -r requirements.txt
  6. Download the trained models: $ sh data/scripts/download_models.sh

Inference Demos

Note: FPS calculations includes all processing, including inference, plotting / tracking, image resizing, etc. See demo script arguments for inference options.

Flash Mob Demo

This demo runs inference on a 720p dance video (native frame-rate of 25 FPS).

alt text

To display the inference results in real-time:
$ python demos/flash_mob.py --weights kapao_s_coco.pt --display --fps

To create the GIF above:
$ python demos/flash_mob.py --weights kapao_s_coco.pt --start 188 --end 196 --gif --fps

Squash Demo

This demo runs inference on a 1080p slow motion squash video (native frame-rate of 25 FPS). It uses a simple player tracking algorithm based on the frame-to-frame pose differences.

alt text

To display the inference results in real-time:
$ python demos/squash.py --weights kapao_s_coco.pt --display --fps

To create the GIF above:
$ python demos/squash.py --weights kapao_s_coco.pt --start 42 --end 50 --gif --fps

COCO Experiments

Download the COCO dataset: $ sh data/scripts/get_coco_kp.sh

Validation (without TTA)

  • KAPAO-S (63.0 AP): $ python val.py --rect
  • KAPAO-M (68.5 AP): $ python val.py --rect --weights kapao_m_coco.pt
  • KAPAO-L (70.6 AP): $ python val.py --rect --weights kapao_l_coco.pt

Validation (with TTA)

  • KAPAO-S (64.3 AP): $ python val.py --scales 0.8 1 1.2 --flips -1 3 -1
  • KAPAO-M (69.6 AP): $ python val.py --weights kapao_m_coco.pt \
    --scales 0.8 1 1.2 --flips -1 3 -1
  • KAPAO-L (71.6 AP): $ python val.py --weights kapao_l_coco.pt \
    --scales 0.8 1 1.2 --flips -1 3 -1

Testing

  • KAPAO-S (63.8 AP): $ python val.py --scales 0.8 1 1.2 --flips -1 3 -1 --task test
  • KAPAO-M (68.8 AP): $ python val.py --weights kapao_m_coco.pt \
    --scales 0.8 1 1.2 --flips -1 3 -1 --task test
  • KAPAO-L (70.3 AP): $ python val.py --weights kapao_l_coco.pt \
    --scales 0.8 1 1.2 --flips -1 3 -1 --task test

Training

The following commands were used to train the KAPAO models on 4 V100s with 32GB memory each.

KAPAO-S:

python -m torch.distributed.launch --nproc_per_node 4 train.py \
--img 1280 \
--batch 128 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5s6.pt \
--project runs/s_e500 \
--name train \
--workers 128

KAPAO-M:

python train.py \
--img 1280 \
--batch 72 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5m6.pt \
--project runs/m_e500 \
--name train \
--workers 128

KAPAO-L:

python train.py \
--img 1280 \
--batch 48 \
--epochs 500 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5l6.pt \
--project runs/l_e500 \
--name train \
--workers 128

Note: DDP is usually recommended but we found training was less stable for KAPAO-M/L using DDP. We are investigating this issue.

CrowdPose Experiments

  • Install the CrowdPose API to your conda environment:
    $ cd .. && git clone https://github.com/Jeff-sjtu/CrowdPose.git
    $ cd CrowdPose/crowdpose-api/PythonAPI && sh install.sh && cd ../../../kapao
  • Download the CrowdPose dataset: $ sh data/scripts/get_crowdpose.sh

Testing

  • KAPAO-S (63.8 AP): $ python val.py --data crowdpose.yaml \
    --weights kapao_s_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1
  • KAPAO-M (67.1 AP): $ python val.py --data crowdpose.yaml \
    --weights kapao_m_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1
  • KAPAO-L (68.9 AP): $ python val.py --data crowdpose.yaml \
    --weights kapao_l_crowdpose.pt --scales 0.8 1 1.2 --flips -1 3 -1

Training

The following commands were used to train the KAPAO models on 4 V100s with 32GB memory each. Training was performed on the trainval split with no validation. The test results above were generated using the last model checkpoint.

KAPAO-S:

python -m torch.distributed.launch --nproc_per_node 4 train.py \
--img 1280 \
--batch 128 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5s6.pt \
--project runs/cp_s_e300 \
--name train \
--workers 128 \
--noval

KAPAO-M:

python train.py \
--img 1280 \
--batch 72 \
--epochs 300 \
--data data/coco-kp.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5m6.pt \
--project runs/cp_m_e300 \
--name train \
--workers 128 \
--noval

KAPAO-L:

python train.py \
--img 1280 \
--batch 48 \
--epochs 300 \
--data data/crowdpose.yaml \
--hyp data/hyps/hyp.kp-p6.yaml \
--val-scales 1 \
--val-flips -1 \
--weights yolov5l6.pt \
--project runs/cp_l_e300 \
--name train \
--workers 128 \
--noval

Acknowledgements

This work was supported in part by Compute Canada, the Canada Research Chairs Program, the Natural Sciences and Engineering Research Council of Canada, a Microsoft Azure Grant, and an NVIDIA Hardware Grant.

If you find this repo is helpful in your research, please cite our paper:

@article{mcnally2021kapao,
  title={Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation},
  author={McNally, William and Vats, Kanav and Wong, Alexander and McPhee, John},
  journal={arXiv preprint arXiv:2111.08557},
  year={2021}
}

Please also consider citing our previous works:

@inproceedings{mcnally2021deepdarts,
  title={DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera},
  author={McNally, William and Walters, Pascale and Vats, Kanav and Wong, Alexander and McPhee, John},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4547--4556},
  year={2021}
}

@article{mcnally2021evopose2d,
  title={EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation Using Accelerated Neuroevolution With Weight Transfer},
  author={McNally, William and Vats, Kanav and Wong, Alexander and McPhee, John},
  journal={IEEE Access},
  volume={9},
  pages={139403--139414},
  year={2021},
  publisher={IEEE}
}
Owner
Will McNally
PhD Candidate
Will McNally
Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks This is the code for the paper: MentorNet: Learning Data-Driven Curriculum fo

Google 302 Dec 23, 2022
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022
Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation" Setting up a python environment Follow the instruction in ht

Michael Tarasiou 11 Oct 09, 2022
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022
On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

PTvsBT On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021) Citation Please cite a

Sunbow Liu 10 Nov 25, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 06, 2022
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

NeuLab 196 Dec 17, 2022
SegNet-Basic with Keras

SegNet-Basic: What is Segnet? Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-wise Image Segmentation Segnet = (Encoder + Decoder)

Yad Konrad 81 Jun 30, 2022
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
Speedy Implementation of Instance-based Learning (IBL) agents in Python

A Python library to create single or multi Instance-based Learning (IBL) agents that are built based on Instance Based Learning Theory (IBLT) 1 Instal

0 Nov 18, 2021
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Tobias Hinz 181 Dec 27, 2022
FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FaceQgen FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment This repository is based on the paper: "FaceQgen: Semi-Supervised D

Javier Hernandez-Ortega 3 Aug 04, 2022
Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Hierarchical Skills for Efficient Exploration This is the source code release for the paper Hierarchical Skills for Efficient Exploration. It contains

Facebook Research 38 Dec 06, 2022
MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

FluxML 278 Dec 11, 2022
Framework for training options with different attention mechanism and using them to solve downstream tasks.

Using Attention in HRL Framework for training options with different attention mechanism and using them to solve downstream tasks. Requirements GPU re

5 Nov 03, 2022
Pansharpening by convolutional neural networks in the full resolution framework

Z-PNN: Zoom Pansharpening Neural Network Pansharpening by convolutional neural networks in the full resolution framework is a deep learning method for

20 Nov 24, 2022
A simple code to convert image format and channel as well as resizing and renaming multiple images.

Rename-Resize-and-convert-multiple-images A simple code to convert image format and channel as well as resizing and renaming multiple images. This cod

Happy N. Monday 3 Feb 15, 2022
Toolbox of models, callbacks, and datasets for AI/ML researchers.

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch Website • Installation • Main

Pytorch Lightning 1.4k Dec 30, 2022
New approach to benchmark VQA models

VQA Benchmarking This repository contains the web application & the python interface to evaluate VQA models. Documentation Please see the documentatio

4 Jul 25, 2022
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

23 Nov 11, 2022