Spatio-Temporal Dynamic Inference Network for Group Activity Recognition
The source codes for ICCV2021 Paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition.
[paper] [supplemental material] [arXiv]
If you find our work or the codebase inspiring and useful to your research, please cite
@inproceedings{yuan2021DIN,
title={Spatio-Temporal Dynamic Inference Network for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong and Wang, Mang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={7476--7485},
year={2021}
}
Dependencies
- Software Environment: Linux (CentOS 7)
- Hardware Environment: NVIDIA TITAN RTX
- Python
3.6 - PyTorch
1.2.0, Torchvision0.4.0 - RoIAlign for Pytorch
Prepare Datasets
- Download publicly available datasets from following links: Volleyball dataset and Collective Activity dataset.
- Unzip the dataset file into
data/volleyballordata/collective. - Download the file
tracks_normalized.pklfrom cvlab-epfl/social-scene-understanding and put it intodata/volleyball/videos
Using Docker
-
Checkout repository and
cd PROJECT_PATH -
Build the Docker container
docker build -t din_gar https://github.com/JacobYuan7/DIN_GAR.git#main
- Run the Docker container
docker run --shm-size=2G -v data/volleyball:/opt/DIN_GAR/data/volleyball -v result:/opt/DIN_GAR/result --rm -it din_gar
--shm-size=2G: To prevent ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)., you have to extend the container's shared memory size. Alternatively:--ipc=host-v data/volleyball:/opt/DIN_GAR/data/volleyball: Makes the host's folderdata/volleyballavailable inside the container at/opt/DIN_GAR/data/volleyball-v result:/opt/DIN_GAR/result: Makes the host's folderresultavailable inside the container at/opt/DIN_GAR/result-it&--rm: Starts the container with an interactive session (PROJECT_PATH is/opt/DIN_GAR) and removes the container after closing the session.din_garthe name/tag of the image- optional:
--gpus='"device=7"'restrict the GPU devices the container can access.
Get Started
-
Train the Base Model: Fine-tune the base model for the dataset.
# Volleyball dataset cd PROJECT_PATH python scripts/train_volleyball_stage1.py # Collective Activity dataset cd PROJECT_PATH python scripts/train_collective_stage1.py
-
Train with the reasoning module: Append the reasoning modules onto the base model to get a reasoning model.
-
Volleyball dataset
-
DIN
python scripts/train_volleyball_stage2_dynamic.py -
lite DIN
We can run DIN in lite version by setting cfg.lite_dim = 128 in scripts/train_volleyball_stage2_dynamic.py.python scripts/train_volleyball_stage2_dynamic.py -
ST-factorized DIN
We can run ST-factorized DIN by setting cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.hierarchical_inference = True.Note that if you set cfg.hierarchical_inference = False, cfg.ST_kernel_size = [(1,3),(3,1)] and cfg.num_DIN = 2, then multiple interaction fields run in parallel.
python scripts/train_volleyball_stage2_dynamic.py
Other model re-implemented by us according to their papers or publicly available codes:
- AT
python scripts/train_volleyball_stage2_at.py - PCTDM
python scripts/train_volleyball_stage2_pctdm.py - SACRF
python scripts/train_volleyball_stage2_sacrf_biute.py - ARG
python scripts/train_volleyball_stage2_arg.py - HiGCIN
python scripts/train_volleyball_stage2_higcin.py
-
-
Collective Activity dataset
- DIN
python scripts/train_collective_stage2_dynamic.py - DIN lite
We can run DIN in lite version by setting 'cfg.lite_dim = 128' in 'scripts/train_collective_stage2_dynamic.py'.python scripts/train_collective_stage2_dynamic.py
- DIN
-
Another work done by us, solving GAR from the perspective of incorporating visual context, is also available.
@inproceedings{yuan2021visualcontext,
title={Learning Visual Context for Group Activity Recognition},
author={Yuan, Hangjie and Ni, Dong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={4},
pages={3261--3269},
year={2021}
}