A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Last update: Dec 12, 2022

Overview

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

The source codes for ICCV2021 Paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition.
[paper] [supplemental material] [arXiv]

If you find our work or the codebase inspiring and useful to your research, please cite

@inproceedings{yuan2021DIN,
  title={Spatio-Temporal Dynamic Inference Network for Group Activity Recognition},
  author={Yuan, Hangjie and Ni, Dong and Wang, Mang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7476--7485},
  year={2021}
}

Dependencies

Software Environment: Linux (CentOS 7)
Hardware Environment: NVIDIA TITAN RTX
Python 3.6
PyTorch 1.2.0, Torchvision 0.4.0
RoIAlign for Pytorch

Prepare Datasets

Download publicly available datasets from following links: Volleyball dataset and Collective Activity dataset.
Unzip the dataset file into data/volleyball or data/collective.
Download the file tracks_normalized.pkl from cvlab-epfl/social-scene-understanding and put it into data/volleyball/videos

Using Docker

Checkout repository and cd PROJECT_PATH
Build the Docker container

docker build -t din_gar https://github.com/JacobYuan7/DIN_GAR.git#main

Run the Docker container

docker run --shm-size=2G -v data/volleyball:/opt/DIN_GAR/data/volleyball -v result:/opt/DIN_GAR/result --rm -it din_gar

--shm-size=2G: To prevent ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)., you have to extend the container's shared memory size. Alternatively: --ipc=host
-v data/volleyball:/opt/DIN_GAR/data/volleyball: Makes the host's folder data/volleyball available inside the container at /opt/DIN_GAR/data/volleyball
-v result:/opt/DIN_GAR/result: Makes the host's folder result available inside the container at /opt/DIN_GAR/result
-it & --rm: Starts the container with an interactive session (PROJECT_PATH is /opt/DIN_GAR) and removes the container after closing the session.
din_gar the name/tag of the image
optional: --gpus='"device=7"' restrict the GPU devices the container can access.

Get Started

Train the Base Model: Fine-tune the base model for the dataset.

# Volleyball dataset
cd PROJECT_PATH 
python scripts/train_volleyball_stage1.py

# Collective Activity dataset
cd PROJECT_PATH 
python scripts/train_collective_stage1.py

Train with the reasoning module: Append the reasoning modules onto the base model to get a reasoning model.
1. Volleyball dataset
  - DIN
```
python scripts/train_volleyball_stage2_dynamic.py
```
  - lite DIN
    We can run DIN in lite version by setting cfg.lite_dim = 128 in scripts/train_volleyball_stage2_dynamic.py.
```
python scripts/train_volleyball_stage2_dynamic.py
```
  - ST-factorized DIN
    We can run ST-factorized DIN by setting cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.hierarchical_inference = True.
    
    Note that if you set cfg.hierarchical_inference = False, cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.num_DIN = 2, then multiple interaction fields run in parallel.
```
python scripts/train_volleyball_stage2_dynamic.py
```
  Other model re-implemented by us according to their papers or publicly available codes:
  - AT
```
python scripts/train_volleyball_stage2_at.py
```
  - PCTDM
```
python scripts/train_volleyball_stage2_pctdm.py
```
  - SACRF
```
python scripts/train_volleyball_stage2_sacrf_biute.py
```
  - ARG
```
python scripts/train_volleyball_stage2_arg.py
```
  - HiGCIN
```
python scripts/train_volleyball_stage2_higcin.py
```
2. Collective Activity dataset
  - DIN
```
python scripts/train_collective_stage2_dynamic.py
```
  - DIN lite
    We can run DIN in lite version by setting 'cfg.lite_dim = 128' in 'scripts/train_collective_stage2_dynamic.py'.
```
python scripts/train_collective_stage2_dynamic.py
```

Another work done by us, solving GAR from the perspective of incorporating visual context, is also available.

@inproceedings{yuan2021visualcontext,
  title={Learning Visual Context for Group Activity Recognition},
  author={Yuan, Hangjie and Ni, Dong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={4},
  pages={3261--3269},
  year={2021}
}

A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Related tags

Overview

Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Dependencies

Prepare Datasets

Using Docker

Get Started

Owner

Code for paper Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Code for the RA-L (ICRA) 2021 paper "SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition"

Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Code of TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

Stacked Recurrent Hourglass Network for Stereo Matching

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Distance-Ratio-Based Formulation for Metric Learning

Auto grind btdb2 exp for tower

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Remote sensing change detection using PaddlePaddle

Unrolled Generative Adversarial Networks

Code for the paper "Generative design of breakwaters usign deep convolutional neural network as a surrogate model"

Collections for the lasted paper about multi-view clustering methods (papers, codes)

Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".

An end-to-end regression problem of predicting the price of properties in Bangalore.

Fast, general, and tested differentiable structured prediction in PyTorch