The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Last update: Jan 03, 2023

Related tags

Deep Learning AcT

Overview

Action Transformer
A Self-Attention Model for Short-Time Human Action Recognition

This repository contains the official TensorFlow implementation of the paper "Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition".

Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance.

To do so, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time, short-time HAR. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [5] on popular datasets for HAR.

This repository allows to easily run a benchmark of AcT models using MPOSE2021, as well as executing a random hyperparameter search.

Usage

First, clone the repository and install the required pip packages (virtual environment recommended!).

pip install -r requirements.txt

To run a random search:

python main.py -s

To run a benchmark:

python main.py -b

That's it!

This code uses the mpose pip package, a friendly tool to download and process MPOSE2021 pose data.

Citations

AcT is intended for scientific research purposes. If you want to use this repository for your research, please cite our work (Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition) as well as [1-5].

@article{mazzia2021action,
  title={Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition},
  author={Mazzia, Vittorio and Angarano, Simone and Salvetti, Francesco and Angelini, Federico and Chiaberge, Marcello},
  journal={Pattern Recognition},
  pages={108487},
  year={2021},
  publisher={Elsevier}
}

References

[1] Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446.

[2] Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE.

[3] Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE.

[4] Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186.

[5] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286).

[6] Mazzia, V., Angarano, S., Salvetti, F., Angelini, F., & Chiaberge, M. (2021). Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition. Pattern Recognition, 108487.

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Related tags

Overview

Action Transformer
A Self-Attention Model for Short-Time Human Action Recognition

Usage

Citations

References

Owner

PIC4SeRCentre

Decorators for maximizing memory utilization with PyTorch & CUDA

This is the code used in the paper "Entity Embeddings of Categorical Variables".

[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

A PyTorch-based library for fast prototyping and sharing of deep neural network models.

RLHive: a framework designed to facilitate research in reinforcement learning.

Contra is a lightweight, production ready Tensorflow alternative for solving time series prediction challenges with AI

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Anomaly detection related books, papers, videos, and toolboxes

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Contains source code for the winning solution of the xView3 challenge

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

ICS 4u HD project, start before-wards. A curtain shooting game using python.

Pretty Tensor - Fluent Neural Networks in TensorFlow

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Contains code for the paper "Vision Transformers are Robust Learners".

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Related tags

Overview

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition

Usage

Citations

References

Owner

PIC4SeRCentre

Decorators for maximizing memory utilization with PyTorch & CUDA

This is the code used in the paper "Entity Embeddings of Categorical Variables".

[ICCV2021] IICNet: A Generic Framework for Reversible Image Conversion

A PyTorch-based library for fast prototyping and sharing of deep neural network models.

RLHive: a framework designed to facilitate research in reinforcement learning.

Contra is a lightweight, production ready Tensorflow alternative for solving time series prediction challenges with AI

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Anomaly detection related books, papers, videos, and toolboxes

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Contains source code for the winning solution of the xView3 challenge

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

ICS 4u HD project, start before-wards. A curtain shooting game using python.

Pretty Tensor - Fluent Neural Networks in TensorFlow

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Contains code for the paper "Vision Transformers are Robust Learners".

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Action Transformer
A Self-Attention Model for Short-Time Human Action Recognition