OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Overview

Introduction

English | 简体中文

Documentation actions codecov PyPI LICENSE Average time to resolve an issue Percentage of issues still open

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.

The master branch works with PyTorch 1.3+.


Action Recognition Results on Kinetics-400

Spatio-Temporal Action Detection Results on AVA-2.1

Major Features

  • Modular design

    We decompose the video understanding framework into different components and one can easily construct a customized video understanding framework by combining different modules.

  • Support for various datasets

    The toolbox directly supports multiple datasets, UCF101, Kinetics-[400/600/700], Something-Something V1&V2, Moments in Time, Multi-Moments in Time, THUMOS14, etc.

  • Support for multiple video understanding frameworks

    MMAction2 implements popular frameworks for video understanding:

    • For action recognition, various algorithms are implemented, including TSN, TSM, TIN, R(2+1)D, I3D, SlowOnly, SlowFast, CSN, Non-local, etc.

    • For temporal action localization, we implement BSN, BMN, SSN.

    • For spatial temporal detection, we implement SlowOnly, SlowFast.

  • Well tested and documented

    We provide detailed documentation and API reference, as well as unittests.

Changelog

v0.13.0 was released in 31/03/2021. Please refer to changelog.md for details and release history.

Benchmark

Model input io backend batch size x gpus MMAction2 (s/iter) MMAction (s/iter) Temporal-Shift-Module (s/iter) PySlowFast (s/iter)
TSN 256p rawframes Memcached 32x8 0.32 0.38 0.42 x
TSN 256p dense-encoded video Disk 32x8 0.61 x x TODO
I3D heavy 256p videos Disk 8x8 0.34 x x 0.44
I3D 256p rawframes Memcached 8x8 0.43 0.56 x x
TSM 256p rawframes Memcached 8x8 0.31 x 0.41 x
Slowonly 256p videos Disk 8x8 0.32 TODO x 0.34
Slowfast 256p videos Disk 8x8 0.69 x x 1.04
R(2+1)D 256p videos Disk 8x8 0.45 x x x

Details can be found in benchmark.

ModelZoo

Supported methods for Action Recognition:

(click to collapse)

Supported methods for Temporal Action Detection:

(click to collapse)
  • BSN (ECCV'2018)
  • BMN (ICCV'2019)
  • SSN (ICCV'2017)

Supported methods for Spatial Temporal Action Detection:

(click to collapse)

Results and models are available in the README.md of each method's config directory. A summary can be found in the model zoo page.

We will keep up with the latest progress of the community, and support more popular algorithms and frameworks. If you have any feature requests, please feel free to leave a comment in Issues.

Dataset

Supported datasets:

Supported datasets for Action Recognition:

(click to collapse)

Supported datasets for Temporal Action Detection

(click to collapse)

Supported datasets for Spatial Temporal Action Detection

(click to collapse)

Datasets marked with 🔲 are not fully supported yet, but related dataset preparation steps are provided.

Installation

Please refer to install.md for installation.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md

Get Started

Please see getting_started.md for the basic usage of MMAction2. There are also tutorials:

A Colab tutorial is also provided. You may preview the notebook here or directly run on Colab.

FAQ

Please refer to FAQ for frequently asked questions.

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}

Contributing

We appreciate all contributions to improve MMAction2. Please refer to CONTRIBUTING.md in MMCV for more details about the contributing guideline.

Acknowledgement

MMAction2 is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new models.

Projects in OpenMMLab

  • MMCV: OpenMMLab foundational library for computer vision.
  • MMClassification: OpenMMLab image classification toolbox and benchmark.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation video understanding toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMPose: OpenMMLab pose estimation toolbox and benchmark.
  • MMEditing: OpenMMLab image and video editing toolbox.
  • MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
Comments
  • [Improvement] Set RandAugment as Imgaug default transforms.

    [Improvement] Set RandAugment as Imgaug default transforms.

    Use imgaug to reimplement RandAugment.

    According to VideoMix, RandAugment helps a little.

    image

    Results

    • sthv1 & tsm-r50, 8V100, 50epochs

    |configs|top1 acc(efficient/accuracy)|top5 acc(efficient/accuracy)| |:-|:-:|:-:| |mmaction2 model zoo|45.58 / 47.70|75.02 / 76.12| |testing with model zoo ckpt|45.47 / 47.55|74.56 / 75.79| |training with default config|45.82 / 47.90|74.38 / 76.02| |flip|47.10 / 48.51|75.02 / 76.12| |randaugment|47.16 / 48.90|76.07 / 77.92| |flip+randaugment|47.85/50.31|76.78/78.18|

    • Kinetics400, 8 V100, test with 256x256 & three crops

    |Models|top1/5 accuracy|Training lost(epoch 100)|training time| |:-:|:-:|:-:|:-:| |TSN-R50-1x1x8-Vanilla|70.74%/89.37%|0.8|2days 12hours| |TSN-R50-1x1x8-RandAugment|71.07%/89.40%|1.3|2days 22hours| |I3D-R50-32x2x1-Vanilla|74.48%/91.62%|1.1|3days 10hours| |I3D-R50-32x2x1-RandAugment|74.23%/91.45%|1.5|4days 10hours|

    opened by irvingzhang0512 40
  • What this training log refers? And for training SlowFast on new data for some custom activity, is there any minimum sample size to start with?

    What this training log refers? And for training SlowFast on new data for some custom activity, is there any minimum sample size to start with?

    I prepared short sample custom data in AVA format for 2 activity Sweeping and walking, then trained SlowFast for 50 epocs on clip_len=16 (due to hardware limitation). Sharing below the training log json details, looks like its not learning anything because mAP is consistently 0 for all epocs, what could be possible reasons behind it?

    Compiler: 10.2\nMMAction2: 0.12.0+13f42bf", "seed": null, "config_name": "custom_slowfast.py", "work_dir": "slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_clean-data_new_e80", "hook_msgs": {}}

    {"mode": "train", "epoch": 1, "iter": 20, "lr": 0.0562, "memory": 8197, "data_time": 0.18563, "loss_action_cls": 0.16409, "[email protected]=0.5": 0.71278, "[email protected]=0.5": 0.67664, "[email protected]": 0.90636, "[email protected]": 0.30212, "[email protected]": 0.91545, "[email protected]": 0.18309, "loss": 0.16409, "grad_norm": 0.91884, "time": 0.99759}
    {"mode": "val", "epoch": 1, "iter": 22, "lr": 0.0598, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 2, "iter": 20, "lr": 0.0958, "memory": 8197, "data_time": 0.1842, "loss_action_cls": 0.10098, "[email protected]=0.5": 0.75593, "[email protected]=0.5": 0.74255, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10098, "grad_norm": 0.34649, "time": 0.98014}
    {"mode": "val", "epoch": 2, "iter": 22, "lr": 0.0994, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 3, "iter": 20, "lr": 0.1354, "memory": 8197, "data_time": 0.18966, "loss_action_cls": 0.10026, "[email protected]=0.5": 0.77377, "[email protected]=0.5": 0.77127, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10026, "grad_norm": 0.30035, "time": 0.99118}
    {"mode": "val", "epoch": 3, "iter": 22, "lr": 0.139, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 4, "iter": 20, "lr": 0.175, "memory": 8197, "data_time": 0.18845, "loss_action_cls": 0.12424, "[email protected]=0.5": 0.79485, "[email protected]=0.5": 0.78929, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.12424, "grad_norm": 0.19094, "time": 0.99367}
    {"mode": "val", "epoch": 4, "iter": 22, "lr": 0.1786, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 5, "iter": 20, "lr": 0.2146, "memory": 8197, "data_time": 0.18817, "loss_action_cls": 0.11159, "[email protected]=0.5": 0.79285, "[email protected]=0.5": 0.77159, "[email protected]": 0.99545, "[email protected]": 0.33182, "[email protected]": 0.99545, "[email protected]": 0.19909, "loss": 0.11159, "grad_norm": 0.16631, "time": 0.99733}
    {"mode": "val", "epoch": 5, "iter": 22, "lr": 0.2182, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 6, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18938, "loss_action_cls": 0.11952, "[email protected]=0.5": 0.735, "[email protected]=0.5": 0.73273, "[email protected]": 0.98, "[email protected]": 0.32667, "[email protected]": 0.98, "[email protected]": 0.196, "loss": 0.11952, "grad_norm": 0.26395, "time": 0.99816}
    {"mode": "val", "epoch": 6, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 7, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.19043, "loss_action_cls": 0.11324, "[email protected]=0.5": 0.82705, "[email protected]=0.5": 0.82227, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.11324, "grad_norm": 0.1336, "time": 0.9999}
    {"mode": "val", "epoch": 7, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    {"mode": "train", "epoch": 8, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18619, "loss_action_cls": 0.08463, "[email protected]=0.5": 0.82482, "[email protected]=0.5": 0.81927, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08463, "grad_norm": 0.11848, "time": 0.99716}
    {"mode": "val", "epoch": 8, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 9, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18562, "loss_action_cls": 0.09073, "[email protected]=0.5": 0.77285, "[email protected]=0.5": 0.77035, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09073, "grad_norm": 0.12449, "time": 0.99849}
    {"mode": "val", "epoch": 9, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 10, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18366, "loss_action_cls": 0.09193, "[email protected]=0.5": 0.81924, "[email protected]=0.5": 0.81369, "recall@top3": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09193, "grad_norm": 0.09078, "time": 0.99763}
    {"mode": "val", "epoch": 10, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 11, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18933, "loss_action_cls": 0.09355, "[email protected]=0.5": 0.84336, "[email protected]=0.5": 0.84086, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09355, "grad_norm": 0.08913, "time": 1.00207}
    {"mode": "val", "epoch": 11, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 12, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18655, "loss_action_cls": 0.09352, "[email protected]=0.5": 0.84199, "[email protected]=0.5": 0.83949, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09352, "grad_norm": 0.09578, "time": 0.99861}
    {"mode": "val", "epoch": 12, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 13, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18258, "loss_action_cls": 0.09836, "[email protected]=0.5": 0.86856, "[email protected]=0.5": 0.86856, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09836, "grad_norm": 0.07878, "time": 0.99762}
    {"mode": "val", "epoch": 13, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 14, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18307, "loss_action_cls": 0.08192, "[email protected]=0.5": 0.86619, "[email protected]=0.5": 0.86619, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08192, "grad_norm": 0.07241, "time": 0.99841}
    {"mode": "val", "epoch": 14, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 15, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18555, "loss_action_cls": 0.07062, "[email protected]=0.5": 0.84995, "[email protected]=0.5": 0.84995, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07062, "grad_norm": 0.07792, "time": 0.99924}
    {"mode": "val", "epoch": 15, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 16, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18864, "loss_action_cls": 0.08495, "[email protected]=0.5": 0.86629, "[email protected]=0.5": 0.86629, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08495, "grad_norm": 0.08121, "time": 1.00141}
    {"mode": "val", "epoch": 16, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 17, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18965, "loss_action_cls": 0.11092, "[email protected]=0.5": 0.8503, "[email protected]=0.5": 0.8503, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.11092, "grad_norm": 0.06323, "time": 1.00582}
    {"mode": "val", "epoch": 17, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 18, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18077, "loss_action_cls": 0.08457, "[email protected]=0.5": 0.85369, "[email protected]=0.5": 0.85369, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08457, "grad_norm": 0.06237, "time": 0.9956}
    {"mode": "val", "epoch": 18, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 19, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18342, "loss_action_cls": 0.08996, "[email protected]=0.5": 0.84434, "[email protected]=0.5": 0.84226, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08996, "grad_norm": 0.07551, "time": 0.99802}
    {"mode": "val", "epoch": 19, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 20, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18127, "loss_action_cls": 0.08211, "[email protected]=0.5": 0.85747, "[email protected]=0.5": 0.85747, "recal[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08211, "grad_norm": 0.06186, "time": 0.99498}
    {"mode": "val", "epoch": 20, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 21, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18135, "loss_action_cls": 0.0857, "[email protected]=0.5": 0.84931, "[email protected]=0.5": 0.84931, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0857, "grad_norm": 0.07136, "time": 0.995}
    {"mode": "val", "epoch": 21, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 22, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18529, "loss_action_cls": 0.08998, "[email protected]=0.5": 0.86644, "[email protected]=0.5": 0.86208, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08998, "grad_norm": 0.07752, "time": 0.99948}
    {"mode": "val", "epoch": 22, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 23, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18675, "loss_action_cls": 0.07464, "[email protected]=0.5": 0.84141, "[email protected]=0.5": 0.84141, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07464, "grad_norm": 0.07109, "time": 1.02437}
    {"mode": "val", "epoch": 23, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 24, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.19255, "loss_action_cls": 0.09615, "[email protected]=0.5": 0.87189, "[email protected]=0.5": 0.87189, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09615, "grad_norm": 0.06948, "time": 1.00467}
    {"mode": "val", "epoch": 24, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 25, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18252, "loss_action_cls": 0.0939, "[email protected]=0.5": 0.86088, "[email protected]=0.5": 0.86088, "re[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0939, "grad_norm": 0.06941, "time": 0.99516}
    {"mode": "val", "epoch": 25, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 26, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18245, "loss_action_cls": 0.09089, "[email protected]=0.5": 0.84902, "[email protected]=0.5": 0.84901, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09089, "grad_norm": 0.05622, "time": 0.99528}
    {"mode": "val", "epoch": 26, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 27, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18309, "loss_action_cls": 0.0874, "[email protected]=0.5": 0.87808, "[email protected]=0.5": 0.87808, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0874, "grad_norm": 0.06894, "time": 0.99701}
    {"mode": "val", "epoch": 27, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 28, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18577, "loss_action_cls": 0.08544, "[email protected]=0.5": 0.84664, "[email protected]=0.5": 0.84437, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08544, "grad_norm": 0.07643, "time": 0.99881}
    {"mode": "val", "epoch": 28, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 29, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18908, "loss_action_cls": 0.10787, "[email protected]=0.5": 0.87369, "[email protected]=0.5": 0.87141, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10787, "grad_norm": 0.05707, "time": 1.00178}
    {"mode": "val", "epoch": 29, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 30, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18647, "loss_action_cls": 0.0934, "[email protected]=0.5": 0.8727, "[email protected]=0.5": 0.87042, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0934, "grad_norm": 0.05735, "time": 0.99853}
    {"mode": "val", "epoch": 30, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 31, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18154, "loss_action_cls": 0.07874, "[email protected]=0.5": 0.85874, "[email protected]=0.5": 0.85874, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07874, "grad_norm": 0.06633, "time": 0.99413}
    {"mode": "val", "epoch": 31, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 32, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18083, "loss_action_cls": 0.07918, "[email protected]=0.5": 0.86742, "[email protected]=0.5": 0.86492, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07918, "grad_norm": 0.06247, "time": 0.9932}
    {"mode": "val", "epoch": 32, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 33, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18088, "loss_action_cls": 0.08861, "[email protected]=0.5": 0.86927, "[email protected]=0.5": 0.86735, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08861, "grad_norm": 0.07271, "time": 0.99552}
    {"mode": "val", "epoch": 33, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 34, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1886, "loss_action_cls": 0.09317, "[email protected]=0.5": 0.86667, "[email protected]=0.5": 0.86667, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09317, "grad_norm": 0.06294, "time": 1.00273}
    {"mode": "val", "epoch": 34, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 35, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18746, "loss_action_cls": 0.089, "[email protected]=0.5": 0.87669, "[email protected]=0.5": 0.87669, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.089, "grad_norm": 0.06243, "time": 0.99921}
    {"mode": "val", "epoch": 35, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 36, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18179, "loss_action_cls": 0.07702, "[email protected]=0.5": 0.86391, "[email protected]=0.5": 0.86391, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07702, "grad_norm": 0.07411, "time": 0.99609}
    {"mode": "val", "epoch": 36, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 37, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18797, "loss_action_cls": 0.08872, "[email protected]=0.5": 0.86088, "[email protected]=0.5": 0.86088, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08872, "grad_norm": 0.07458, "time": 0.99985}
    {"mode": "val", "epoch": 37, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 38, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18704, "loss_action_cls": 0.08762, "[email protected]=0.5": 0.87121, "[email protected]=0.5": 0.86843, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08762, "grad_norm": 0.06538, "time": 0.99896}
    {"mode": "val", "epoch": 38, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 39, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18852, "loss_action_cls": 0.08822, "[email protected]=0.5": 0.85919, "[email protected]=0.5": 0.85919, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08822, "grad_norm": 0.07977, "time": 1.0016}
    {"mode": "val", "epoch": 39, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 40, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18234, "loss_action_cls": 0.09024, "[email protected]=0.5": 0.85601, "[email protected]=0.5": 0.85601, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09024, "grad_norm": 0.06097, "time": 0.99434}
    {"mode": "val", "epoch": 40, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 41, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18165, "loss_action_cls": 0.09851, "[email protected]=0.5": 0.84987, "[email protected]=0.5": 0.84737, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09851, "grad_norm": 0.06554, "time": 0.99627}
    {"mode": "val", "epoch": 41, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 42, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18597, "loss_action_cls": 0.10595, "[email protected]=0.5": 0.87117, "[email protected]=0.5": 0.87117, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10595, "grad_norm": 0.05842, "time": 0.99769}
    {"mode": "val", "epoch": 42, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 43, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1856, "loss_action_cls": 0.08387, "[email protected]=0.5": 0.86939, "[email protected]=0.5": 0.86939, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08387, "grad_norm": 0.06906, "time": 1.00146}
    {"mode": "val", "epoch": 43, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 44, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18118, "loss_action_cls": 0.08536, "[email protected]=0.5": 0.85187, "[email protected]=0.5": 0.85187, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08536, "grad_norm": 0.0665, "time": 0.9931}
    {"mode": "val", "epoch": 44, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 45, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18369, "loss_action_cls": 0.09834, "[email protected]=0.5": 0.84446, "[email protected]=0.5": 0.84169, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09834, "grad_norm": 0.07264, "time": 0.99587}
    {"mode": "val", "epoch": 45, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 46, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18497, "loss_action_cls": 0.07137, "[email protected]=0.5": 0.85472, "[email protected]=0.5": 0.85194, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07137, "grad_norm": 0.07303, "time": 0.99785}
    {"mode": "val", "epoch": 46, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 47, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18986, "loss_action_cls": 0.07812, "[email protected]=0.5": 0.86687, "[email protected]=0.5": 0.86687, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07812, "grad_norm": 0.06059, "time": 1.00136}
    {"mode": "val", "epoch": 47, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 48, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.188, "loss_action_cls": 0.09891, "[email protected]=0.5": 0.85929, "[email protected]=0.5": 0.85929, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09891, "grad_norm": 0.05919, "time": 0.99993}
    {"mode": "val", "epoch": 48, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 49, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18616, "loss_action_cls": 0.06949, "[email protected]=0.5": 0.85987, "[email protected]=0.5": 0.85987, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.06949, "grad_norm": 0.07458, "time": 0.99806}
    {"mode": "val", "epoch": 49, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 50, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1849, "loss_action_cls": 0.07176, "[email protected]=0.5": 0.88101, "[email protected]=0.5": 0.88101, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07176, "grad_norm": 0.06244, "time": 0.99677}
    {"mode": "val", "epoch": 50, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    opened by arvindchandel 34
  • [Feature] Support TSM-MobileNetV2

    [Feature] Support TSM-MobileNetV2

    TODO list

    • [x] mobilenetv2 backbone & unittest.
    • [x] tsm-mobilenetv2 backbone & unittest.
    • [x] covnert checkpoint from origin repo.
      • original repo: 30 test crops, 19520 samples, top1/5 accuracy is 69.54%/88.66%
      • mmaction2 convertion: 10 test crops, 18219 samples, top1/5 accuracy is 69.04%/88.23%.
    • [x] Refactor mobilenet with mmcls
    • [x] changelog
    • [x] training with mmaction2 & update model zoo.
      • I don't have enough gpus to train on kinetics400, maybe next week i can have a try...
      • 贫穷的泪水

    training results of mobilenet-tsm with DenseSampleFrames1x1x8. (origin ckpt get 69.54%/88.66% top1/5 accuracy).

    |lr|epochs|gpus|weight decay|top1/5 acuracy| |:-:|:-:|:-:|:-:|:-:| |0.00875|50|7|0.0001|63.75%/85.52%| |0.0025|50|4|0.0001|65.11%/85.99%| |0.0025|100|4|0.0001|66.xx%/86.xx%| |0.004|100|4|0.00004|68.31%/88.00%| |0.0075|100|6|0.00004|68.41%/88.07%|

    opened by irvingzhang0512 33
  • [Improvement] Training custom classes of ava dataset

    [Improvement] Training custom classes of ava dataset

    Target

    Training some of the 80 ava classes to save training time and hopefully get better results for selected classes.

    TODO

    • [x] dataset/evaluation codes.
    • [x] unittest
    • [x] docs
    • [x] sample config
    • [x] model zoo, compare results.
    • [x] Add input arg topk for BBoxHeadAVA, because num_classes may be smaller than 5.
    • [x] ~check whether exclude_file_xxx will affect the results.~

    results

    • slowonly_kinetics_pretrained_r50_4*16

    |custom classes|mAP(train 80 classes)|mAP (train custom classes only)|selected classes comment| |:-:|-:|-:|-:| |range(1, 15)|0.3460|0.3399|all PERSON_MOVEMENT classes| |[11, 12, 14, 15, 79, 80]|0.7066|0.7011|AP(80 classes ckpt) > 0.6| |[1,4,8,9,13,17,28,49,74]|0.4339|0.4397|AP(80 classes ckpt) in[0.3, 0.6)| |[3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72]|0.1948|0.3311|AP(80 classes ckpt) in[0.1, 0.3)| |[11,12,17,74,79,80]|0.6520|0.6438|> 50000 samples| |[1,8,14,59]|0.4307|0.5549|[5000, 50000) samples| |[3,4,6,9,10,15,27,28,29,38,41,48,49,54,61,64,65,66,67,70,77]|0.2384|0.3269|[1000, 5000) samples| |[22,37,47,51,63,68,72,78]|0.0753|0.3209|[500, 1000) samples| |[2,5,7,13,20,24,26,30,34,36,42,45,46,52,56,57,58,60,62,69,73,75,76]|0.0348|0.1806|[100, 500) samples| |[16,18,19,21,23,25,31,32,33,35,39,40,43,44,50,53,55,71]|0.0169|0.1984|<100 samples|

    insights

    I think ava dataset suffers from series class imbalance. Training custom classes helps to get better results for classes with fewer samples.

    image

    opened by irvingzhang0512 29
  • Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0

    Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0

    Tried to train the model with our custom data (over 200+ videos). After training it with 50 apochs. mAP was still 0.0 after every epoch while validation. Can you help me in this?

    Note: For annotations, I'm using normalized x1,y1 (top left corner) x2,y2 (bottom-right corner). Is it correct format or I need to change it ?

    Below is my custom config file:

    
    custom_classes = [1, 2, 3, 4, 5]
    num_classes = 6
    model = dict(
        type='FastRCNN',
        backbone=dict(
            type='ResNet3dSlowOnly',
            depth=50,
            pretrained=None,
            pretrained2d=False,
            lateral=False,
            num_stages=4,
            conv1_kernel=(1, 7, 7),
            conv1_stride_t=1,
            pool1_stride_t=1,
            spatial_strides=(1, 2, 2, 1)),
        roi_head=dict(
            type='AVARoIHead',
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor3D',
                roi_layer_type='RoIAlign',
                output_size=8,
                with_temporal_pool=True),
            bbox_head=dict(
                type='BBoxHeadAVA',
                in_channels=2048,
                num_classes=6,
                multilabel=True,
                topk=(2, 3),
                dropout_ratio=0.5)),
        train_cfg=dict(
            rcnn=dict(
                assigner=dict(
                    type='MaxIoUAssignerAVA',
                    pos_iou_thr=0.9,
                    neg_iou_thr=0.9,
                    min_pos_iou=0.9),
                sampler=dict(
                    type='RandomSampler',
                    num=32,
                    pos_fraction=1,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=1.0,
                debug=False)),
        test_cfg=dict(rcnn=dict(action_thr=0.002)))
    dataset_type = 'AVADataset'
    data_root = 'tools/data/SAI/rawframes'
    anno_root = 'tools/data/SAI/Annotations'
    ann_file_train = 'tools/data/SAI/Annotations/ava_format_train.csv'
    ann_file_val = 'tools/data/SAI/Annotations/ava_format_test.csv'
    label_file = 'tools/data/SAI/Annotations/action_list.pbtxt'
    proposal_file_train = 'tools/data/SAI/Annotations/proposals_train.pkl'
    proposal_file_val = 'tools/data/SAI/Annotations/proposals_test.pkl'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
    train_pipeline = [
        dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
        dict(type='RawFrameDecode'),
        dict(type='RandomRescale', scale_range=(256, 320)),
        dict(type='RandomCrop', size=256),
        dict(type='Flip', flip_ratio=0.5),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_bgr=False),
        dict(type='FormatShape', input_format='NCTHW', collapse=True),
        dict(type='Rename', mapping=dict(imgs='img')),
        dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
        dict(
            type='ToDataContainer',
            fields=[
                dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
            ]),
        dict(
            type='Collect',
            keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
            meta_keys=['scores', 'entity_ids'])
    ]
    val_pipeline = [
        dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
        dict(type='RawFrameDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_bgr=False),
        dict(type='FormatShape', input_format='NCTHW', collapse=True),
        dict(type='Rename', mapping=dict(imgs='img')),
        dict(type='ToTensor', keys=['img', 'proposals']),
        dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
        dict(
            type='Collect',
            keys=['img', 'proposals'],
            meta_keys=['scores', 'img_shape'],
            nested=True)
    ]
    data = dict(
        videos_per_gpu=1,
        workers_per_gpu=4,
        val_dataloader=dict(
            videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
        train_dataloader=dict(
            videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
        test_dataloader=dict(
            videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
        train=dict(
            type='AVADataset',
            ann_file='tools/data/SAI/Annotations/ava_format_train.csv',
            pipeline=[
                dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
                dict(type='RawFrameDecode'),
                dict(type='RandomRescale', scale_range=(256, 320)),
                dict(type='RandomCrop', size=256),
                dict(type='Flip', flip_ratio=0.5),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_bgr=False),
                dict(type='FormatShape', input_format='NCTHW', collapse=True),
                dict(type='Rename', mapping=dict(imgs='img')),
                dict(
                    type='ToTensor',
                    keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
                dict(
                    type='ToDataContainer',
                    fields=[
                        dict(
                            key=['proposals', 'gt_bboxes', 'gt_labels'],
                            stack=False)
                    ]),
                dict(
                    type='Collect',
                    keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
                    meta_keys=['scores', 'entity_ids'])
            ],
            label_file='tools/data/SAI/Annotations/action_list.pbtxt',
            proposal_file='tools/data/SAI/Annotations/proposals_train.pkl',
            person_det_score_thr=0.9,
            num_classes=6,
            custom_classes=[1, 2, 3, 4, 5],
            data_prefix='tools/data/SAI/rawframes'),
        val=dict(
            type='AVADataset',
            ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
            pipeline=[
                dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
                dict(type='RawFrameDecode'),
                dict(type='Resize', scale=(-1, 256)),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_bgr=False),
                dict(type='FormatShape', input_format='NCTHW', collapse=True),
                dict(type='Rename', mapping=dict(imgs='img')),
                dict(type='ToTensor', keys=['img', 'proposals']),
                dict(
                    type='ToDataContainer',
                    fields=[dict(key='proposals', stack=False)]),
                dict(
                    type='Collect',
                    keys=['img', 'proposals'],
                    meta_keys=['scores', 'img_shape'],
                    nested=True)
            ],
            label_file='tools/data/SAI/Annotations/action_list.pbtxt',
            proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
            person_det_score_thr=0.9,
            num_classes=6,
            custom_classes=[1, 2, 3, 4, 5],
            data_prefix='tools/data/SAI/rawframes'),
        test=dict(
            type='AVADataset',
            ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
            pipeline=[
                dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
                dict(type='RawFrameDecode'),
                dict(type='Resize', scale=(-1, 256)),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_bgr=False),
                dict(type='FormatShape', input_format='NCTHW', collapse=True),
                dict(type='Rename', mapping=dict(imgs='img')),
                dict(type='ToTensor', keys=['img', 'proposals']),
                dict(
                    type='ToDataContainer',
                    fields=[dict(key='proposals', stack=False)]),
                dict(
                    type='Collect',
                    keys=['img', 'proposals'],
                    meta_keys=['scores', 'img_shape'],
                    nested=True)
            ],
            label_file='tools/data/SAI/Annotations/action_list.pbtxt',
            proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
            person_det_score_thr=0.9,
            num_classes=6,
            custom_classes=[1, 2, 3, 4, 5],
            data_prefix='tools/data/SAI/rawframes'))
    optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=1e-05)
    optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
    lr_config = dict(
        policy='step',
        step=[10, 15],
        warmup='linear',
        warmup_by_epoch=True,
        warmup_iters=5,
        warmup_ratio=0.1)
    total_epochs = 50
    train_ratio = [1, 1]
    checkpoint_config = dict(interval=1)
    workflow = [('train', 1)]
    evaluation = dict(interval=1, save_best='[email protected]')
    log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    work_dir = './SAI/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb'
    load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth'
    resume_from = None
    find_unused_parameters = False
    omnisource = False
    module_hooks = []
    gpu_ids = range(0, 1)
    
    
    
    opened by memona008 28
  • Whether it is distributed training or not, errors will occur

    Whether it is distributed training or not, errors will occur

    Thanks for your contribution! When I try to train a model, whether use distributed training, there are errors. The installation is instructed by your install.md and there is no error when I prepare the enviroment.

    There is a similar issue, but I have checked that there is no error during the installation(I have reinstalled the conda env)

    For single GPU

    $ python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py                  
    2020-11-07 19:47:14,013 - mmaction - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
    CUDA available: True
    GPU 0,1,2: GeForce GTX 1080 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 10.0, V10.0.130
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.7.0
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 10.2
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
      - CuDNN 7.6.5
      - Magma 2.5.2
      - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.8.1
    OpenCV: 4.4.0
    MMCV: 1.1.6
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 10.0
    MMAction2: 0.8.0+76819e4
    ------------------------------------------------------------
    
    2020-11-07 19:47:14,014 - mmaction - INFO - Distributed training: False
    2020-11-07 19:47:14,014 - mmaction - INFO - Config: /home/liming/code/video/test/mmaction2/configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
    # model settings
    model = dict(
        type='Recognizer2D',
        backbone=dict(
            type='ResNet',
            pretrained='torchvision://resnet50',
            depth=50,
            norm_eval=False),
        cls_head=dict(
            type='TSNHead',
            num_classes=101,
            in_channels=2048,
            spatial_type='avg',
            consensus=dict(type='AvgConsensus', dim=1),
            dropout_ratio=0.4,
            init_std=0.001))
    # model training and testing settings
    train_cfg = None
    test_cfg = dict(average_clips=None)
    # dataset settings
    dataset_type = 'VideoDataset'
    data_root = 'data/ucf101/videos/'
    data_root_val = 'data/ucf101/videos/'
    split = 1  # official train/test splits. valid numbers: 1, 2, 3
    ann_file_train = f'data/ucf101/ucf101_train_split_{split}_videos.txt'
    ann_file_val = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    ann_file_test = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
    train_pipeline = [
        dict(type='DecordInit'),
        dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='RandomResizedCrop'),
        dict(type='Resize', scale=(224, 224), keep_ratio=False),
        dict(type='Flip', flip_ratio=0.5),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs', 'label'])
    ]
    val_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=3,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='CenterCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    test_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=25,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='ThreeCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    data = dict(
        videos_per_gpu=32,
        workers_per_gpu=4,
        train=dict(
            type=dataset_type,
            ann_file=ann_file_train,
            data_prefix=data_root,
            pipeline=train_pipeline),
        val=dict(
            type=dataset_type,
            ann_file=ann_file_val,
            data_prefix=data_root_val,
            pipeline=val_pipeline),
        test=dict(
            type=dataset_type,
            ann_file=ann_file_test,
            data_prefix=data_root_val,
            pipeline=test_pipeline))
    # optimizer
    # lr = 0.00128 for 8 GPUs * 32 video/gpu, 0.00015 for 3 GPUs * 10 videos/gpu, 5e-5 for 1 GPU * 10 videos/gpu
    optimizer = dict(
        type='SGD', lr=0.00048, momentum=0.9,
        weight_decay=0.0005)  # this lr is used for 8 gpus
    optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
    # learning policy
    lr_config = dict(policy='step', step=[])
    total_epochs = 1
    checkpoint_config = dict(interval=5)
    evaluation = dict(
        interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])
    log_config = dict(
        interval=20,
        hooks=[
            dict(type='TextLoggerHook'),
            # dict(type='TensorboardLoggerHook'),
        ])
    # runtime settings
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    
    2020-11-07 19:47:14,568 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
    2020-11-07 19:47:16,547 - mmaction - INFO - Start running, host: [email protected], work_dir: /home/liming/code/video/test/mmaction2/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
    2020-11-07 19:47:16,547 - mmaction - INFO - workflow: [('train', 1)], max: 1 epochs
    2020-11-07 19:47:30,330 - mmaction - INFO - Epoch [1][20/299]   lr: 4.800e-04, eta: 0:03:12, time: 0.689, data_time: 0.153, memory: 8244, top1_acc: 0.0141, top5_acc: 0.0703, loss_cls: 4.6118, loss: 4.6118, grad_norm: 5.5581
    2020-11-07 19:47:40,713 - mmaction - INFO - Epoch [1][40/299]   lr: 4.800e-04, eta: 0:02:36, time: 0.519, data_time: 0.000, memory: 8244, top1_acc: 0.0266, top5_acc: 0.0828, loss_cls: 4.5864, loss: 4.5864, grad_norm: 5.5972
    2020-11-07 19:47:51,104 - mmaction - INFO - Epoch [1][60/299]   lr: 4.800e-04, eta: 0:02:17, time: 0.520, data_time: 0.000, memory: 8244, top1_acc: 0.0484, top5_acc: 0.0938, loss_cls: 4.5600, loss: 4.5600, grad_norm: 5.6577
    2020-11-07 19:48:01,512 - mmaction - INFO - Epoch [1][80/299]   lr: 4.800e-04, eta: 0:02:03, time: 0.520, data_time: 0.000, memory: 8244, top1_acc: 0.0484, top5_acc: 0.1437, loss_cls: 4.5178, loss: 4.5178, grad_norm: 5.6118
    2020-11-07 19:48:11,938 - mmaction - INFO - Epoch [1][100/299]  lr: 4.800e-04, eta: 0:01:50, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.0797, top5_acc: 0.1938, loss_cls: 4.4669, loss: 4.4669, grad_norm: 5.7034
    2020-11-07 19:48:22,364 - mmaction - INFO - Epoch [1][120/299]  lr: 4.800e-04, eta: 0:01:38, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.0875, top5_acc: 0.2406, loss_cls: 4.4534, loss: 4.4534, grad_norm: 5.7623
    2020-11-07 19:48:32,792 - mmaction - INFO - Epoch [1][140/299]  lr: 4.800e-04, eta: 0:01:26, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1156, top5_acc: 0.2781, loss_cls: 4.4031, loss: 4.4031, grad_norm: 5.7466
    2020-11-07 19:48:43,221 - mmaction - INFO - Epoch [1][160/299]  lr: 4.800e-04, eta: 0:01:15, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1703, top5_acc: 0.3422, loss_cls: 4.3451, loss: 4.3451, grad_norm: 5.7538
    2020-11-07 19:48:53,649 - mmaction - INFO - Epoch [1][180/299]  lr: 4.800e-04, eta: 0:01:04, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1656, top5_acc: 0.3656, loss_cls: 4.3214, loss: 4.3214, grad_norm: 5.7920
    2020-11-07 19:49:04,084 - mmaction - INFO - Epoch [1][200/299]  lr: 4.800e-04, eta: 0:00:53, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.1938, top5_acc: 0.3844, loss_cls: 4.2619, loss: 4.2619, grad_norm: 5.8725
    2020-11-07 19:49:14,525 - mmaction - INFO - Epoch [1][220/299]  lr: 4.800e-04, eta: 0:00:42, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.2359, top5_acc: 0.3906, loss_cls: 4.1983, loss: 4.1983, grad_norm: 5.8417
    2020-11-07 19:49:24,974 - mmaction - INFO - Epoch [1][240/299]  lr: 4.800e-04, eta: 0:00:31, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.1938, top5_acc: 0.4281, loss_cls: 4.1371, loss: 4.1371, grad_norm: 6.0010
    2020-11-07 19:49:35,435 - mmaction - INFO - Epoch [1][260/299]  lr: 4.800e-04, eta: 0:00:20, time: 0.523, data_time: 0.000, memory: 8244, top1_acc: 0.1922, top5_acc: 0.4359, loss_cls: 4.0732, loss: 4.0732, grad_norm: 5.9770
    2020-11-07 19:49:45,881 - mmaction - INFO - Epoch [1][280/299]  lr: 4.800e-04, eta: 0:00:10, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.2406, top5_acc: 0.4516, loss_cls: 4.0252, loss: 4.0252, grad_norm: 6.1316
    [1]    30756 segmentation fault (core dumped)  python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
    

    For multiple GPUs:

    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
    *****************************************
    2020-11-07 19:44:48,222 - mmaction - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
    CUDA available: True
    GPU 0,1,2: GeForce GTX 1080 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 10.0, V10.0.130
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.7.0
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 10.2
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
      - CuDNN 7.6.5
      - Magma 2.5.2
      - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.8.1
    OpenCV: 4.4.0
    MMCV: 1.1.6
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 10.0
    MMAction2: 0.8.0+76819e4
    ------------------------------------------------------------
    
    2020-11-07 19:44:48,223 - mmaction - INFO - Distributed training: True
    2020-11-07 19:44:48,223 - mmaction - INFO - Config: /home/liming/code/video/test/mmaction2/configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
    # model settings
    model = dict(
        type='Recognizer2D',
        backbone=dict(
            type='ResNet',
            pretrained='torchvision://resnet50',
            depth=50,
            norm_eval=False),
        cls_head=dict(
            type='TSNHead',
            num_classes=101,
            in_channels=2048,
            spatial_type='avg',
            consensus=dict(type='AvgConsensus', dim=1),
            dropout_ratio=0.4,
            init_std=0.001))
    # model training and testing settings
    train_cfg = None
    test_cfg = dict(average_clips=None)
    # dataset settings
    dataset_type = 'VideoDataset'
    data_root = 'data/ucf101/videos/'
    data_root_val = 'data/ucf101/videos/'
    split = 1  # official train/test splits. valid numbers: 1, 2, 3
    ann_file_train = f'data/ucf101/ucf101_train_split_{split}_videos.txt'
    ann_file_val = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    ann_file_test = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
    train_pipeline = [
        dict(type='DecordInit'),
        dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='RandomResizedCrop'),
        dict(type='Resize', scale=(224, 224), keep_ratio=False),
        dict(type='Flip', flip_ratio=0.5),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs', 'label'])
    ]
    val_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=3,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='CenterCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    test_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=25,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='ThreeCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    data = dict(
        videos_per_gpu=32,
        workers_per_gpu=4,
        train=dict(
            type=dataset_type,
            ann_file=ann_file_train,
            data_prefix=data_root,
            pipeline=train_pipeline),
        val=dict(
            type=dataset_type,
            ann_file=ann_file_val,
            data_prefix=data_root_val,
            pipeline=val_pipeline),
        test=dict(
            type=dataset_type,
            ann_file=ann_file_test,
            data_prefix=data_root_val,
            pipeline=test_pipeline))
    # optimizer
    # lr = 0.00128 for 8 GPUs * 32 video/gpu, 0.00015 for 3 GPUs * 10 videos/gpu, 5e-5 for 1 GPU * 10 videos/gpu
    optimizer = dict(
        type='SGD', lr=0.00048, momentum=0.9,
        weight_decay=0.0005)  # this lr is used for 8 gpus
    optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
    # learning policy
    lr_config = dict(policy='step', step=[])
    total_epochs = 1
    checkpoint_config = dict(interval=5)
    evaluation = dict(
        interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])
    log_config = dict(
        interval=20,
        hooks=[
            dict(type='TextLoggerHook'),
            # dict(type='TensorboardLoggerHook'),
        ])
    # runtime settings
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    
    2020-11-07 19:44:48,776 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
    2020-11-07 19:44:49,087 - mmaction - INFO - Start running, host: [email protected], work_dir: /home/liming/code/video/test/mmaction2/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
    2020-11-07 19:44:49,087 - mmaction - INFO - workflow: [('train', 1)], max: 1 epochs
    2020-11-07 19:45:07,472 - mmaction - INFO - Epoch [1][20/100]   lr: 4.800e-04, eta: 0:01:13, time: 0.918, data_time: 0.346, memory: 8333, top1_acc: 0.0281, top5_acc: 0.1016, loss_cls: 4.6039, loss: 4.6039, grad_norm: 3.2696
    2020-11-07 19:45:18,411 - mmaction - INFO - Epoch [1][40/100]   lr: 4.800e-04, eta: 0:00:43, time: 0.547, data_time: 0.001, memory: 8333, top1_acc: 0.0437, top5_acc: 0.1385, loss_cls: 4.5756, loss: 4.5756, grad_norm: 3.2719
    2020-11-07 19:45:29,362 - mmaction - INFO - Epoch [1][60/100]   lr: 4.800e-04, eta: 0:00:26, time: 0.548, data_time: 0.001, memory: 8333, top1_acc: 0.0818, top5_acc: 0.1781, loss_cls: 4.5440, loss: 4.5440, grad_norm: 3.2529
    2020-11-07 19:45:40,321 - mmaction - INFO - Epoch [1][80/100]   lr: 4.800e-04, eta: 0:00:12, time: 0.548, data_time: 0.000, memory: 8333, top1_acc: 0.0688, top5_acc: 0.2083, loss_cls: 4.5135, loss: 4.5135, grad_norm: 3.2930
    2020-11-07 19:45:50,957 - mmaction - INFO - Epoch [1][100/100]  lr: 4.800e-04, eta: 0:00:00, time: 0.532, data_time: 0.000, memory: 8333, top1_acc: 0.1174, top5_acc: 0.2649, loss_cls: 4.4753, loss: 4.4753, grad_norm: 3.3248
    Traceback (most recent call last):
      File "/home/liming/anaconda3/envs/test/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/liming/anaconda3/envs/test/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/liming/anaconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
        main()
      File "/home/liming/anaconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
        raise subprocess.CalledProcessError(returncode=process.returncode,
    subprocess.CalledProcessError: Command '['/home/liming/anaconda3/envs/test/bin/python', '-u', './tools/train.py', '--local_rank=2', 'configs/tsn_r50_1x1x3_75e_ucf101_rgb.py', '--launcher', 'pytorch']' died with <Signals.SIGSEGV: 11>.
    

    Here are my conda env list:

    _libgcc_mutex             0.1                        main    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    addict                    2.3.0                    pypi_0    pypi
    blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ca-certificates           2020.10.14                    0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    certifi                   2020.6.20                pypi_0    pypi
    cudatoolkit               10.2.89              hfd86e86_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    cycler                    0.10.0                   pypi_0    pypi
    dataclasses               0.6                      pypi_0    pypi
    freetype                  2.10.4               h5ab3b9f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    future                    0.18.2                   pypi_0    pypi
    intel-openmp              2020.2                      254    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    jpeg                      9b                   h024ee3a_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    kiwisolver                1.3.1                    pypi_0    pypi
    lcms2                     2.11                 h396b838_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ld_impl_linux-64          2.33.1               h53a641e_7    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libedit                   3.1.20191231         h14c3975_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libffi                    3.3                  he6710b0_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libgcc-ng                 9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libpng                    1.6.37               hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libstdcxx-ng              9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libtiff                   4.1.0                h2733197_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libuv                     1.40.0               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    lz4-c                     1.9.2                heb0550a_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    matplotlib                3.3.2                    pypi_0    pypi
    mkl                       2020.2                      256    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mkl-service               2.3.0            py38he904b0f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mkl_fft                   1.2.0            py38h23d657b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mkl_random                1.1.1            py38h0573a6f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mmaction2                 0.8.0                     dev_0    <develop>
    mmcv-full                 1.1.6                    pypi_0    pypi
    ncurses                   6.2                  he6710b0_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ninja                     1.10.1           py38hfd86e86_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    numpy                     1.19.2           py38h54aff64_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    numpy-base                1.19.2           py38hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    olefile                   0.46                       py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    opencv-contrib-python     4.4.0.46                 pypi_0    pypi
    openssl                   1.1.1h               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pillow                    8.0.1            py38he98fc37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pip                       20.2.4           py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pyparsing                 3.0.0b1                  pypi_0    pypi
    python                    3.8.5                h7579374_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    python-dateutil           2.8.1                    pypi_0    pypi
    pytorch                   1.7.0           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
    pyyaml                    5.3.1                    pypi_0    pypi
    readline                  8.0                  h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    setuptools                50.3.0           py38h06a4308_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    six                       1.15.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    sqlite                    3.33.0               h62c20be_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    tk                        8.6.10               hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    torchaudio                0.7.0                      py38    pytorch
    torchvision               0.8.1                py38_cu102    pytorch
    typing_extensions         3.7.4.3                    py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    wheel                     0.35.1                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    xz                        5.2.5                h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    yapf                      0.30.0                   pypi_0    pypi
    zlib                      1.2.11               h7b6447c_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    zstd                      1.4.5                h9ceee32_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    

    The whole install steps are:

    conda create -n test python=3.8 -y
    conda activate test
    
    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
    
    pip install mmcv
    
    git clone https://github.com/open-mmlab/mmaction2.git
    cd mmaction2
    pip install -r requirements/build.txt
    python setup.py develop
    
    mkdir data
    ln -s PATH_TO_DATA data
    
    opened by limingcv 28
  • [Feature] Support Webcam Demo for Spatio-temporal Action Detection Models

    [Feature] Support Webcam Demo for Spatio-temporal Action Detection Models

    Description

    This implementation is based on SlowFast Spatio-temporal Action Detection Webcam Demo.

    TODO

    • [x] Multi-threads for read/display/inference.
    • Human detector
      • [x] easy to use abstract class
      • [x] mmdet
      • ~[ ] yolov4 human detector~: it seems human detector is not the bottleneck for this demo.
    • [x] MMAction2 stdet models.
    • Output result
      • [x] cv2.imshow
      • [x] write to local video file.
    • [x] decouple display frame shape and model frame shape.
    • [x] logging
    • [x] remvoe global variables
    • [x] BUG: Unexpected exit when read thread is dead and display thread is alive.
    • [x] BUG: Ignore sampeling strategy
    • [x] fix known issue.
    • [x] Improvement: In SlowFast Webcam Demo, predict_stepsize must in range [clip_len * frame_interval // 2, clip_len * frame_interval]. Find a way to support predict_stepsize in range [0, clip_len * frame_interval]
    • Docs
      • [x] Annotations in script
      • [x] demo/README.md
      • [x] docs_zh_CN/demo.md

    Known issue

    • config model -> test_cfg -> rcnn -> action_thr should be .0 instead of current default value 0.002. This may cause different bboxes number for different actions.
    result = stdet_model(...)[0]
    
    previous_shape = None
    for class_id in range(len(result)):
        if previous_shape is None:
            previous_shape = result[class_id].shape
        else:
            assert previous_shape == result[class_id].shape, 'This assertion error may be raised.'
    
    • This may cause index of range error

    https://github.com/open-mmlab/mmaction2/blob/905f07a7128c4d996af13d47d25546ad248ee187/demo/demo_spatiotemporal_det.py#L345-L364

    j of result[i][j, 4] may be out of range. The for j in range(proposal.shape[0]) loop are assuming that all of the result[i] has the same shape, aka the same bbox number for different actions.

    Usage

    • Modify --output-fps according to printed log DEBUG:__main__:Read Thread: {duration} ms, {fps} fps.
    • Modify --predict-stepsize so that the durations for read and inference, which are both printed by logger, are almost the same.
    python demo/webcam_demo_spatiotemporal_det.py --show \
      --output-fps 15 \
      --predict-stepsize 8
    
    opened by irvingzhang0512 26
  • For a single GPU,the code training hangs...

    For a single GPU,the code training hangs...

    @innerlee I'm very sorry to disturb you, For a single GPU when I run this command

    $ python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py 。。。the code training hangs...
    2020-12-06 03:47:12,059 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
    2020-12-06 03:47:14,590 - mmaction - INFO - Start running, host: [email protected], work_dir: /data6/sky/acd/mmaction2/tools/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
    2020-12-06 03:47:14,590 - mmaction - INFO - workflow: [('train', 1)], max: 15 epochs
    

    (pytorch1.4.0+mmcv-full 1.2.1+cuda101), and the --validate option has been tried but no difference.

    awaiting response 
    opened by skyqwe123 24
  • Still some bugs during AVA training

    Still some bugs during AVA training

    I reported some bugs during AVA training last time. Finally(I rename the image files manually), I can run the command "./tools/dist_train.sh configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py 4 --validate" in my PC. But during Epoch[1] [120/11524], it raise a bug "FileNotFoundError: [Errno 2] No such file or directory: '/home/avadata/ava/rawframes/7g37N3eoQ9s/img_26368.jpg'". It seems like There are still some bugs in the file name correspondence.
    BTW, I find that in the config file I'm using, line89 and line 90 are as follows : line89: # Rename is needed to use mmdet detectors line90: dict(type='Rename', mapping=dict(imgs='img')) These codes maybe use for change the file name from ${video_name}_00001.jpg to img_00001.jpg(I guess). But actually, it does not work for some reasons. And I can not find a module named "Rename" in mmdet. Hope you can check the questions, thanks a lot.

    opened by SKBL5694 22
  • Posec3D Inference on video.

    Posec3D Inference on video.

    I custom trained posec3d on very small data. And tried to inference it on a video. It is throwing this error

    load checkpoint from local path: D:/pycharmprojects/skeleton_action_recognition/Runs/slowonly_r50_u48_240e_ntu120_xsub_keypoint/epoch_10.pth
    Traceback (most recent call last):
      File "D:/pycharmprojects/skeleton_action_recognition/inference.py", line 19, in <module>
        results = inference_recognizer(model, video, labels)
      File "D:\pycharmprojects\skeleton_action_recognition\mmaction\apis\inference.py", line 171, in inference_recognizer
        data = test_pipeline(data)
      File "D:\pycharmprojects\skeleton_action_recognition\mmaction\datasets\pipelines\compose.py", line 50, in __call__
        data = t(data)
      File "D:\pycharmprojects\skeleton_action_recognition\mmaction\datasets\pipelines\augmentations.py", line 213, in __call__
        kp = results['keypoint']
    KeyError: 'keypoint'
    

    When I debugged, the result variable does not have that key. 12

    Any suggestions?

    The Inference file:

    import torch
    
    from mmaction.apis import init_recognizer, inference_recognizer
    
    config_file = 'configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint_test.py'
    # download the checkpoint from model zoo and put it in `checkpoints/`
    checkpoint_file = 'Runs/slowonly_r50_u48_240e_ntu120_xsub_keypoint/epoch_10.pth'
    
    # assign the desired device.
    device = 'cuda:0' # or 'cpu'
    device = torch.device(device)
    
     # build the model from a config file and a checkpoint file
    model = init_recognizer(config_file, checkpoint_file, device=device)
    
    # test a single video and show the result:
    video = 'demo/Correct_vid/S001C001P001R001A001_rgb3.avi'
    labels = 'custom_pose_data/labels.txt'
    results = inference_recognizer(model, video, labels)
    
    # show the results
    # labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
    # labels = [x.strip() for x in labels]
    lables = [0]
    results = [(labels[k[0]], k[1]) for k in results]
    
    # print(f'The top-5 labels with corresponding scores are:')
    # for result in results:
    #     print(f'{result[0]}: ', result[1])
    

    The Config file:

    model = dict(
        type='Recognizer3D',
        backbone=dict(
            type='ResNet3dSlowOnly',
            depth=50,
            pretrained=None,
            in_channels=17,
            base_channels=32,
            num_stages=3,
            out_indices=(2, ),
            stage_blocks=(4, 6, 3),
            conv1_stride_s=1,
            pool1_stride_s=1,
            inflate=(0, 1, 1),
            spatial_strides=(2, 2, 2),
            temporal_strides=(1, 1, 2),
            dilations=(1, 1, 1)),
        cls_head=dict(
            type='I3DHead',
            in_channels=512,
            num_classes=1, # change the class here
            spatial_type='avg',
            dropout_ratio=0.5),
        train_cfg=dict(),
        test_cfg=dict(average_clips='prob'))
    
    dataset_type = 'VideoDataset' # PoseDataset'
    ann_file_train = 'custom_pose_data/merge2.pkl'
    ann_file_val = 'custom_pose_data/merge2.pkl'
    left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
    right_kp = [2, 4, 6, 8, 10, 12, 14, 16]
    
    test_pipeline = [
        dict(
            type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), #num_clips=10
        dict(type='PoseDecode'),
        dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
        dict(type='Resize', scale=(-1, 64)),
        dict(type='CenterCrop', crop_size=64),
        dict(
            type='GeneratePoseTarget',
            sigma=0.6,
            use_score=True,
            with_kp=True,
            with_limb=False,
            double=True,
            left_kp=left_kp,
            right_kp=right_kp),
        dict(type='FormatShape', input_format='NCTHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    data = dict(
        videos_per_gpu=1,
        workers_per_gpu=0,
        test_dataloader=dict(videos_per_gpu=1),
        test=dict(
            type=dataset_type,
            ann_file=None,
            data_prefix=None,
            pipeline=test_pipeline))
    

    Thank You

    opened by BakingBrains 20
  • [Fix] typo of tsm-r50 & sthv2

    [Fix] typo of tsm-r50 & sthv2

    TYPO

    In tsm, num_clips should be the same for train/val/test pipelines.

    CANNOT reproduce tsm-r50/sthv2 results

    I did some tests with tsm-r50/shtv2 these days and CANNOT reproduce the results from the model zoo.

    Generally speaking

    1. ckpt from the model zoo leads to WORSE results than that of model zoo reports.
    2. ckpt trained by myself leas to BETTER results than that of model zoo reports.

    Can anyone verify this?

    |model|top1/5 accuracy for efficient mode|top1/5 accuracy for accurate mode| |:-:|:-:|:-:| |copy from model zoo|57.86/84.67 |61.12/86.26| |test with the model zoo ckpt by myself|55.56/82.94|56.92/83.92| |trained by myself with the default config, trained ckpt here|58.91/85.10|61.68/86.71|

    PS: I'm using videos instead of rawframes. PPS: efficient/accurate test pipelines are listed as follows

    # efficient
    test_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=8,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='CenterCrop', crop_size=224),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    
    # accurate
    test_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=8,
            twice_sample=True,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='ThreeCrop', crop_size=256),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    
    opened by irvingzhang0512 20
  • Change the mmaction2 model to onnx failed.

    Change the mmaction2 model to onnx failed.

    I have trained a .pth file, I want to deploy this model, and I read this: https://github.com/open-mmlab/mmaction2/blob/master/docs/en/tutorials/6_export_model.md

    this is my code: python tools/deployment/pytorch2onnx.py configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py my_code/checkpoints/best_top1_acc_epoch_18.pth --shape 1 1 3 224 224 --verify

    but I got the wrong answer:

    Traceback (most recent call last): File "tools/deployment/pytorch2onnx.py", line 165, in pytorch2onnx( File "tools/deployment/pytorch2onnx.py", line 69, in pytorch2onnx torch.onnx.export( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 504, in export _export( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 1529, in _export graph, params_dict, torch_out = _model_to_graph( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 1111, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 987, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 891, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\jit_trace.py", line 1184, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\jit_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\jit_trace.py", line 118, in wrapper outs.append(self.inner(*trace_inputs)) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\nn\modules\module.py", line 1178, in _slow_forward result = self.forward(*input, **kwargs) TypeError: forward_dummy() got multiple values for argument 'softmax'

    I don't know why.

    onnx 
    opened by yinghaodang 0
  • [Feature] TCANet for localization

    [Feature] TCANet for localization

    Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily got feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

    Motivation

    Please describe the motivation of this PR and the goal you want to achieve through this PR.

    Modification

    Please briefly describe what modification is made in this PR.

    BC-breaking (Optional)

    Does the modification introduces changes that break the back-compatibility of this repo? If so, please describe how it breaks the compatibility and how users should modify their codes to keep compatibility with this PR.

    Use cases (Optional)

    If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

    Checklist

    1. Pre-commit or other linting tools should be used to fix the potential lint issues.
    2. The modification should be covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
    3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
    4. The documentation should be modified accordingly, like docstring or example tutorials.
    opened by hukkai 1
  • How to train mmaction2 with your own data.

    How to train mmaction2 with your own data.

    Hi, I've been researching and gathering information on how to perform this task for a couple of days now and I'm not sure if anyone can help me.

    I want to explain what I want to do, I have a set of videos which describe some actions that are not included in any existing data set and I want to pass this data as a training model to be able to recognise these actions.

    In this case I have proceeded to create a text file with the path to the video and the label of the video.

    For example: video1.mp4 1 video2.mp4 2 video1.mp4 3

    Where the tags are custom actions.

    Entering this to mmaction2 through the tutorial generated an error:

    RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    Then I started to look at how to input the data in a better way but I didn't quite understand how to do it.

    I don't know if you can explain me what steps I should follow to be able to train mmaction2 to detect custom actions.

    opened by pgutierrezce 10
  • Skeleton-based Action Recognition Demo

    Skeleton-based Action Recognition Demo

    Hello, I'm testing the demo script to predict the skeleton-based action recognition result using a single video. I have 2 doubts about this demo.

    python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4
    --config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py
    --checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth
    --det-config demo/faster_rcnn_r50_fpn_2x_coco.py
    --det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth
    --det-score-thr 0.9
    --pose-config demo/hrnet_w32_coco_256x192.py
    --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth
    --label-map tools/data/skeleton/label_map_ntu120.txt

    First, the recognition of the action does not match any of my own videos, I have tried to change the config file and checkpoint but nothing and I don't know what to try now.

    The second question I have is that I need to be able to extract information from the resulting video, that is, with the skeleton. I need to get metrics of the time a person is standing or not moving, is there a method that does this?

    Thanks.

    opened by antonio2600 12
  • Skeleton-based model (PoseC3D) for Real-Time Webcam Inference

    Skeleton-based model (PoseC3D) for Real-Time Webcam Inference

    Hi! I tried to do real-time webcam inference with PoseC3D model. However, using high-level API of mmdetection and mmpose, I had a bottleneck in their inference time. Processing time for each inference is taking too long even with multi-thread and without the PoseC3D model.

    After I saw the Webcam API from mmpose, I realized it could achieve the speed I desire. I wonder if I can implement or register the PoseC3D model as a node in that API so it could be run as an independent thread.

    Is there any way to:

    1. Speed up the detection and pose inferences for every frame (referring to this skeleton demo code) so that I can visualize the pose estimation while stacking the pose inference result for further action recognition (using PoseC3D)
    2. Register PoseC3D inference to Webcam API in mmpose as a node so that it could be run there (I tried to manually add the node but it is too complicated and I failed to do so)

    Thank you!

    opened by juliussin 1
Releases(v1.0.0rc1)
  • v1.0.0rc1(Oct 14, 2022)

    Highlights

    • Support Video Swin Transformer

    New Features

    • Support Video Swin Transformer (#1939)

    Improvements

    • Add colab tutorial for 1.x (#1956)
    • Support skeleton-based action recognition demo (#1920)

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0rc0(Sep 1, 2022)

    We are excited to announce the release of MMAction2 v1.0.0rc0. MMAction2 1.0.0beta is the first version of MMAction2 1.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine.

    Highlights

    • New engines. MMAction2 1.x is based on MMEngine](https://github.com/open-mmlab/mmengine), which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.

    • Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMAction2 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.

    • More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.

    Breaking Changes

    In this release, we made lots of major refactoring and modifications. Please refer to the migration guide for details and migration instructions.

    Source code(tar.gz)
    Source code(zip)
  • v0.24.1(Jul 29, 2022)

  • v0.24.0(May 5, 2022)

    Highlights

    • Support different seeds

    New Features

    • Add lateral norm in multigrid config (#1567)
    • Add openpose 25 joints in graph config (#1578)
    • Support MLU Backend (#1608)

    Bug and Typo Fixes

    • Fix local_rank (#1558)
    • Fix install typo (#1571)
    • Fix the inference API doc (#1580)
    • Fix zh-CN demo.md and getting_started.md (#1587)
    • Remove Recommonmark (#1595)
    • Fix inference with ndarray (#1603)
    • Fix the log error when IterBasedRunner is used (#1606)
    Source code(tar.gz)
    Source code(zip)
  • v0.23.0(Apr 2, 2022)

    Highlights

    • Support different seeds
    • Provide multi-node training & testing script
    • Update error log

    New Features

    • Support different seeds(#1502)
    • Provide multi-node training & testing script(#1521)
    • Update error log(#1546)

    Documentations

    • Update gpus in Slowfast readme(#1497)
    • Fix work_dir in multigrid config(#1498)
    • Add sub bn docs(#1503)
    • Add shortcycle sampler docs(#1513)
    • Update Windows Declaration(#1520)
    • Update the link for ST-GCN(#1544)
    • Update install commands(#1549)

    Bug and Typo Fixes

    • Update colab tutorial install cmds(#1522)
    • Fix num_iters_per_epoch in analyze_logs.py(#1530)
    • Fix distributed_sampler(#1532)
    • Fix cd dir error(#1545)
    • Update arg names(#1548)
    Source code(tar.gz)
    Source code(zip)
  • v0.22.0(Mar 7, 2022)

    0.22.0 (03/05/2022)

    Highlights

    • Support Multigrid training strategy
    • Support CPU training
    • Support audio demo
    • Support topk customizing in models/heads/base.py

    New Features

    • Support Multigrid training strategy(#1378)
    • Support STGCN in demo_skeleton.py(#1391)
    • Support CPU training(#1407)
    • Support audio demo(#1425)
    • Support topk customizing in models/heads/base.py(#1452)

    Documentations

    • Add OpenMMLab platform(#1393)
    • Update links(#1394)
    • Update readme in configs(#1404)
    • Update instructions to install mmcv-full(#1426)
    • Add shortcut(#1433)
    • Update modelzoo(#1439)
    • add video_structuralize in readme(#1455)
    • Update OpenMMLab repo information(#1482)

    Bug and Typo Fixes

    • Update train.py(#1375)
    • Fix printout bug(#1382)
    • Update multi processing setting(#1395)
    • Setup multi processing both in train and test(#1405)
    • Fix bug in nondistributed multi-gpu training(#1406)
    • Add variable fps in ava_dataset.py(#1409)
    • Only support distributed training(#1414)
    • Set test_mode for AVA configs(#1432)
    • Support single label(#1434)
    • Add check copyright(#1447)
    • Support Windows CI(#1448)
    • Fix wrong device of class_weight in models/losses/cross_entropy_loss.py(#1457)
    • Fix bug caused by distributed(#1459)
    • Update readme(#1460)
    • Fix lint caused by colab automatic upload(#1461)
    • Refine CI(#1471)
    • Update pre-commit(#1474)
    • Add deprecation message for deploy tool(#1483)

    ModelZoo

    • Support slowfast_steplr(#1421)
    Source code(tar.gz)
    Source code(zip)
  • v0.21.0(Dec 31, 2021)

    Highlights

    • Support 2s-AGCN
    • Support publish models in Windows
    • Improve some sthv1 related models
    • Support BABEL

    New Features

    • Support 2s-AGCN(#1248)
    • Support skip postproc in ntu_pose_extraction(#1295)
    • Support publish models in Windows(#1325)
    • Add copyright checkhook in pre-commit-config(#1344)

    Documentations

    • Add MMFlow (#1273)
    • Revise README.md and add projects.md (#1286)
    • Add 2s-AGCN in Updates(#1289)
    • Add MMFewShot(#1300)
    • Add MMHuman3d(#1304)
    • Update pre-commit(#1313)
    • Use share menu from the theme instead(#1328)
    • Update installation command(#1340)

    Bug and Typo Fixes

    • Update the inference part in notebooks(#1256)
    • Update the map_location(#1262)
    • Fix bug that start_index is not used in RawFrameDecode(#1278)
    • Fix bug in init_random_seed(#1282)
    • Fix bug in setup.py(#1303)
    • Fix interrogate error in workflows(#1305)
    • Fix typo in slowfast config(#1309)
    • Cancel previous runs that are not completed(#1327)
    • Fix missing skip_postproc parameter(#1347)
    • Update ssn.py(#1355)
    • Use latest youtube-dl(#1357)
    • Fix test-best(#1362)

    ModelZoo

    • Improve some sthv1 related models(#1306)
    • Support BABEL(#1332)
    Source code(tar.gz)
    Source code(zip)
  • v0.20.0(Oct 30, 2021)

    Highlights

    • Support TorchServe
    • Add video structuralize demo
    • Support using 3D skeletons for skeleton-based action recognition
    • Benchmark PoseC3D on UCF and HMDB

    New Features

    • Support TorchServe (#1212)
    • Support 3D skeletons pre-processing (#1218)
    • Support video structuralize demo (#1197)

    Documentations

    • Revise README.md and add projects.md (#1214)
    • Add CN docs for Skeleton dataset, PoseC3D and ST-GCN (#1228, #1237, #1236)
    • Add tutorial for custom dataset training for skeleton-based action recognition (#1234)

    Bug and Typo Fixes

    ModelZoo

    • Benchmark PoseC3D on UCF and HMDB (#1223)
    • Add ST-GCN + 3D skeleton model for NTU60-XSub (#1236)

    New Contributors

    • @bit-scientist made their first contribution in https://github.com/open-mmlab/mmaction2/pull/1234

    Full Changelog: https://github.com/open-mmlab/mmaction2/compare/v0.19.0...v0.20.0

    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Oct 7, 2021)

    Highlights

    • Support ST-GCN
    • Refactor the inference API
    • Add code spell check hook

    New Features

    Improvement

    • Add label maps for every dataset (#1127)
    • Remove useless code MultiGroupCrop (#1180)
    • Refactor Inference API (#1191)
    • Add code spell check hook (#1208)
    • Use docker in CI (#1159)

    Documentations

    • Update metafiles to new OpenMMLAB protocols (#1134)
    • Switch to new doc style (#1160)
    • Improve the ERROR message (#1203)
    • Fix invalid URL in getting_started (#1169)

    Bug and Typo Fixes

    • Compatible with new MMClassification (#1139)
    • Add missing runtime dependencies (#1144)
    • Fix THUMOS tag proposals path (#1156)
    • Fix LoadHVULabel (#1194)
    • Switch the default value of persistent_workers to False (#1202)
    • Fix _freeze_stages for MobileNetV2 (#1193)
    • Fix resume when building rawframes (#1150)
    • Fix device bug for class weight (#1188)
    • Correct Arg names in extract_audio.py (#1148)

    ModelZoo

    • Add TSM-MobileNetV2 ported from TSM (#1163)
    • Add ST-GCN for NTURGB+D-XSub-60 (#1123)
    Source code(tar.gz)
    Source code(zip)
  • v0.18.0(Sep 2, 2021)

    Improvement

    • Add CopyRight (#1099)
    • Support NTU Pose Extraction (#1076)
    • Support Caching in RawFrameDecode (#1078)
    • Add citations & Support python3.9 CI & Use fixed-version sphinx (#1125)

    Documentations

    • Add Descriptions of PoseC3D dataset (#1053)

    Bug and Typo Fixes

    • Fix SSV2 checkpoints (#1101)
    • Fix CSN normalization (#1116)
    • Fix typo (#1121)
    • Fix new_crop_quadruple bug (#1108)
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Aug 3, 2021)

    Highlights

    • Support PyTorch 1.9
    • Support Pytorchvideo Transforms
    • Support PreciseBN

    New Features

    • Support Pytorchvideo Transforms (#1008)
    • Support PreciseBN (#1038)

    Improvements

    • Remove redundant augmentations in config files (#996)
    • Make resource directory to hold common resource pictures (#1011)
    • Remove deperecated FrameSelector (#1010)
    • Support Concat Dataset (#1000)
    • Add to-mp4 option to resize_videos.py (#1021)
    • Add option to keep tail frames (#1050)
    • Update MIM support (#1061)
    • Calculate Top-K accurate and inaccurate classes (#1047)

    Bug and Typo Fixes

    • Fix bug in PoseC3D demo (#1009)
    • Fix some problems in resize_videos.py (#1012)
    • Support torch1.9 (#1015)
    • Remove redundant code in CI (#1046)
    • Fix bug about persistent_workers (#1044)
    • Support TimeSformer feature extraction (#1035)
    • Fix ColorJitter (#1025)

    ModelZoo

    • Add TSM-R50 sthv1 models trained by PytorchVideo RandAugment and AugMix (#1008)
    • Update SlowOnly SthV1 checkpoints (#1034)
    • Add SlowOnly Kinetics400 checkpoints trained with Precise-BN (#1038)
    • Add CSN-R50 from scratch checkpoints (#1045)
    • TPN Kinetics-400 Checkpoints trained with the new ColorJitter (#1025)

    Documentation

    • Add Chinese translation of feature_extraction.md (#1020)
    • Fix the code snippet in getting_started.md (#1023)
    • Fix TANet config table (#1028)
    • Add description to PoseC3D dataset (#1053)
    Source code(tar.gz)
    Source code(zip)
  • v0.16.0(Jul 1, 2021)

    Highlights

    • Support using backbone from pytorch-image-models(timm)
    • Support PIMS Decoder
    • Demo for skeleton-based action recognition
    • Support Timesformer

    New Features

    • Support using backbones from pytorch-image-models(timm) for TSN (#880)
    • Support torchvision transformations in preprocessing pipelines (#972)
    • Demo for skeleton-based action recognition (#972)
    • Support Timesformer (#839)

    Improvements

    • Add a tool to find invalid videos (#907, #950)
    • Add an option to specify spectrogram_type (#909)
    • Add json output to video demo (#906)
    • Add MIM related docs (#918)
    • Rename lr to scheduler (#916)
    • Support --cfg-options for demos (#911)
    • Support number counting for flow-wise filename template (#922)
    • Add Chinese tutorial (#941)
    • Change ResNet3D default values (#939)
    • Adjust script structure (#935)
    • Add font color to args in long_video_demo (#947)
    • Polish code style with Pylint (#908)
    • Support PIMS Decoder (#946)
    • Improve Metafiles (#956, #979, #966)
    • Add links to download Kinetics400 validation (#920)
    • Audit the usage of shutil.rmtree (#943)
    • Polish localizer related codes(#913)

    Bug and Typo Fixes

    • Fix spatiotemporal detection demo (#899)
    • Fix docstring for 3D inflate (#925)
    • Fix bug of writing text to video with TextClip (#952)
    • Fix mmcv install in CI (#977)

    ModelZoo

    • Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones (#880)
    • Port CSN checkpoints from VMZ (#945)
    • Release various checkpoints for UCF101, HMDB51 and Sthv1 (#938)
    • Support Timesformer (#839)
    • Update TSM modelzoo (#981)
    Source code(tar.gz)
    Source code(zip)
  • v0.15.0(May 31, 2021)

    Highlights

    • Support PoseC3D
    • Support ACRN
    • Support MIM

    New Features

    • Support PoseC3D (#786, #890)
    • Support MIM (#870)
    • Support ACRN and Focal Loss (#891)
    • Support Jester dataset (#864)

    Improvements

    • Add metric_options for evaluation to docs (#873)
    • Support creating a new label map based on custom classes for demos about spatio temporal demo (#879)
    • Improve document about AVA dataset preparation (#878)
    • Provide a script to extract clip-level feature (#856)

    Bug and Typo Fixes

    • Fix issues about resume (#877, #878)
    • Correct the key name of eval_results dictionary for metric 'mmit_mean_average_precision' (#885)

    ModelZoo

    • Support Jester dataset (#864)
    • Support ACRN and Focal Loss (#891)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(May 3, 2021)

    Highlights

    • Support TRN
    • Support Diving48

    New Features

    • Support TRN (#755)
    • Support Diving48 (#835)
    • Support Webcam Demo for Spatio-temporal Action Detection Models (#795)

    Improvements

    • Add softmax option for pytorch2onnx tool (#781)
    • Support TRN (#755)
    • Test with onnx models and TensorRT engines (#758)
    • Speed up AVA Testing (#784)
    • Add self.with_neck attribute (#796)
    • Update installation document (#798)
    • Use a random master port (#809)
    • Update AVA processing data document (#801)
    • Refactor spatio-temporal augmentation (#782)
    • Add QR code in CN README (#812)
    • Add Alternative way to download Kinetics (#817, #822)
    • Refactor Sampler (#790)
    • Use EvalHook in MMCV with backward compatibility (#793)
    • Use MMCV Model Registry (#843)

    Bug and Typo Fixes

    • Fix a bug in pytorch2onnx.py when num_classes <= 4 (#800, #824)
    • Fix demo_spatiotemporal_det.py error (#803, #805)
    • Fix loading config bugs when resume (#820)
    • Make HMDB51 annotation generation more robust (#811)

    ModelZoo

    • Update checkpoint for 256 height in something-V2 (#789)
    • Support Diving48 (#835)
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Apr 1, 2021)

    Highlights

    • Support LFB
    • Support using backbone from MMCls/TorchVision
    • Add Chinese documentation

    New Features

    Improvements

    • Add slowfast config/json/log/ckpt for training custom classes of AVA (#678)
    • Set RandAugment as Imgaug default transforms (#585)
    • Add --test-last & --test-best for tools/train.py to test checkpoints after training (#608)
    • Add fcn_testing in TPN (#684)
    • Remove redundant recall functions (#741)
    • Recursively remove pretrained step for testing (#695)
    • Improve demo by limiting inference fps (#668)

    Bug and Typo Fixes

    • Fix a bug about multi-class in VideoDataset (#723)
    • Reverse key-value in anet filelist generation (#686)
    • Fix flow norm cfg typo (#693)

    ModelZoo

    • Add LFB for AVA2.1 (#553)
    • Add TSN with ResNeXt-101-32x4d backbone as an example for using MMCls backbones (#679)
    • Add TSN with Densenet161 backbone as an example for using TorchVision backbones (#720)
    • Add slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb (#690)
    • Add slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb (#704)
    • Add slowonly_nl_kinetics_pretrained_r50_4x16x1(8x8x1)_20e_ava_rgb (#730)
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Mar 1, 2021)

    Highlights

    • Support TSM-MobileNetV2
    • Support TANet
    • Support GPU Normalize

    New Features

    • Support TSM-MobileNetV2 (#415)
    • Support flip with label mapping (#591)
    • Add seed option for sampler (#642)
    • Support GPU Normalize (#586)
    • Support TANet (#595)

    Improvements

    • Training custom classes of ava dataset (#555)
    • Add CN README in homepage (#592, #594)
    • Support soft label for CrossEntropyLoss (#625)
    • Refactor config: Specify train_cfg and test_cfg in model (#629)
    • Provide an alternative way to download older kinetics annotations (#597)
    • Update FAQ for
      • 1). data pipeline about video and frames (#598)
      • 2). how to show results (#598)
      • 3). batch size setting for batchnorm (#657)
      • 4). how to fix stages of backbone when finetuning models (#658)
    • Modify default value of save_best (#600)
    • Use BibTex rather than latex in markdown (#607)
    • Add warnings of uninstalling mmdet and supplementary documents (#624)
    • Support soft label for CrossEntropyLoss (#625)

    Bug and Typo Fixes

    • Fix value of pem_low_temporal_iou_threshold in BSN (#556)
    • Fix ActivityNet download script (#601)

    ModelZoo

    • Add TSM-MobileNetV2 for Kinetics400 (#415)
    • Add deeper SlowFast models (#605)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Feb 1, 2021)

    Highlights

    • Support imgaug
    • Support spatial temporal demo
    • Refactor EvalHook, config structure, unittest structure

    New Features

    • Support imgaug for augmentations in the data pipeline (#492)
    • Support setting max_testing_views for extremely large models to save GPU memory used (#511)
    • Add spatial temporal demo (#547, #566)

    Improvements

    • Refactor EvalHook (#395)
    • Refactor AVA hook (#567)
    • Add repo citation (#545)
    • Add dataset size of Kinetics400 (#503)
    • Add lazy operation docs (#504)
    • Add class_weight for CrossEntropyLoss and BCELossWithLogits (#509)
    • add some explanation about the resampling in slowfast (#502)
    • Modify paper title in README.md (#512)
    • Add alternative ways to download Kinetics (#521)
    • Add OpenMMLab projects link in README (#530)
    • Change default preprocessing to shortedge to 256 (#538)
    • Add config tag in dataset README (#540)
    • Add solution for markdownlint installation issue (#497)
    • Add dataset overview in readthedocs (#548)
    • Modify the trigger mode of the warnings of missing mmdet (583)
    • Refactor config structure (#488, #572)
    • Refactor unittest structure (#433)

    Bug and Typo Fixes

    • Fix a bug about ava dataset validation (#527)
    • Fix a bug about ResNet pretrain weight initialization (#582)
    • Fix a bug in CI due to MMCV index (#495)
    • Remove invalid links of MiT and MMiT (#516)
    • Fix frame rate bug for AVA preparation (#576)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jan 5, 2021)

    Highlights

    • Support Spatio-Temporal Action Detection (AVA)
    • Support precise BN

    New Features

    • Support precise BN (#501)
    • Support Spatio-Temporal Action Detection (AVA) (#351)
    • Support to return feature maps in inference_recognizer (#458)

    Improvements

    • Add arg stride to long_video_demo.py, to make inference faster (#468)
    • Support training and testing for Spatio-Temporal Action Detection (#351)
    • Fix CI due to pip upgrade (#454)
    • Add markdown lint in pre-commit hook (#255)
    • Speed up confusion matrix calculation (#465)
    • Use title case in modelzoo statistics (#456)
    • Add FAQ documents for easy troubleshooting. (#413, #420, #439)
    • Support Spatio-Temporal Action Detection with context (#471)
    • Add class weight for CrossEntropyLoss and BCELossWithLogits (#509)
    • Add Lazy OPs docs (#504)

    Bug and Typo Fixes

    • Fix typo in default argument of BaseHead (#446)
    • Fix potential bug about output_config overwrite (#463)

    ModelZoo

    • Add SlowOnly, SlowFast for AVA2.1 (#351)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 1, 2020)

    Highlights

    • Support GradCAM utils for recognizers
    • Support ResNet Audio model

    New Features

    • Automatically add modelzoo statistics to readthedocs (#327)
    • Support GYM99 data preparation (#331)
    • Add AudioOnly Pathway from AVSlowFast. (#355)
    • Add GradCAM utils for recognizer (#324)
    • Add print config script (#345)
    • Add online motion vector decoder (#291)

    Improvements

    • Support PyTorch 1.7 in CI (#312)
    • Support to predict different labels in a long video (#274)
    • Update docs bout test crops (#359)
    • Polish code format using pylint manually (#338)
    • Update unittest coverage (#358, #322, #325)
    • Add random seed for building filelists (#323)
    • Update colab tutorial (#367)
    • set default batch_size of evaluation and testing to 1 (#250)
    • Rename the preparation docs to README.md (#388)
    • Move docs about demo to demo/README.md (#329)
    • Remove redundant code in tools/test.py (#310)
    • Automatically calculate number of test clips for Recognizer2D (#359)

    Bug and Typo Fixes

    • Fix rename Kinetics classnames bug (#384)
    • Fix a bug in BaseDataset when data_prefix is None (#314)
    • Fix a bug about tmp_folder in OpenCVInit (#357)
    • Fix get_thread_id when not using disk as backend (#354, #357)
    • Fix the bug of HVU object num_classes from 1679 to 1678 (#307)
    • Fix typo in export_model.md (#399)
    • Fix OmniSource training configs (#321)
    • Fix Issue #306: Bug of SampleAVAFrames (#317)

    ModelZoo

    • Add SlowOnly model for GYM99, both RGB and Flow (#336)
    • Add auto modelzoo statistics in readthedocs (#327)
    • Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. (#372)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Oct 31, 2020)

    v0.8.0 (31/10/2020)

    Highlights

    • Support OmniSource
    • Support C3D
    • Support video recognition with audio modality
    • Support HVU
    • Support X3D

    New Features

    • Support AVA dataset preparation (#266)
    • Support the training of video recognition dataset with multiple tag categories (#235)
    • Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. (#242)
    • Support to specify a start epoch to conduct evaluation (#216)
    • Implement X3D models, support testing with model weights converted from SlowFast (#288)

    Improvements

    • Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases (#232)
    • Extend HVU datatools to generate individual file list for each tag category (#258)
    • Support data preparation for Kinetics-600 and Kinetics-700 (#254)
    • Add cfg-options in arguments to override some settings in the used config for convenience (#212)
    • Rename the old evaluating protocol mean_average_precision as mmit_mean_average_precision since it is only used on MMIT and is not the mAP we usually talk about. Add mean_average_precision, which is the real mAP (#235)
    • Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model (#241)
    • Add citations in each preparing_dataset.md in tools/data/dataset (#289)
    • Update the performance of audio-visual fusion on Kinetics-400 (#281)
    • Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo (#294)
    • Use metric_options dict to provide metric args in evaluate (#286)

    Bug Fixes

    • Register FrameSelector in PIPELINES (#268)
    • Fix the potential bug for default value in dataset_setting (#245)
    • Fix the data preparation bug for something-something dataset (#278)
    • Fix the invalid config url in slowonly README data benchmark (#249)
    • Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes (#256)
    • Correct the img_norm_cfg used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% (#273)

    ModelZoo

    • Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 (#259)
    • Add OmniSource benchmark on MiniKineitcs (#296)
    • Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU (#287)
    • Add X3D models ported from SlowFast (#288)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 3, 2020)

    Highlights

    • Support TPN
    • Support JHMDB, UCF101-24, HVU dataset preparation
    • support onnx model conversion

    New Features

    • Support the data pre-processing pipeline for the HVU Dataset (#277)
    • Support real-time action recognition from web camera (#171)
    • Support onnx (#160)
    • Support UCF101-24 preparation (#219)
    • Support evaluating mAP for ActivityNet with CUHK17_activitynet_pred (#176)
    • Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature (#190)
    • Support JHMDB preparation (#220)

    ModelZoo

    • Add finetuning setting for SlowOnly (#173)
    • Add TSN and SlowOnly models trained with OmniSource, which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 (#215)

    Improvements

    • Support demo with video url (#165)
    • Support multi-batch when testing (#184)
    • Add tutorial for adding a new learning rate updater (#181)
    • Add config name in meta info (#183)
    • Remove git hash in __version__ (#189)
    • Check mmcv version (#189)
    • Update url with 'https://download.openmmlab.com' (#208)
    • Update Docker file to support PyTorch 1.6 and update install.md (#209)
    • Polish readsthedocs display (#217, #229)

    Bug Fixes

    • Fix the bug when using OpenCV to extract only RGB frames with original shape (#184)
    • Fix the bug of sthv2 num_classes from 339 to 174 (#174, #207)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Sep 2, 2020)

    Highlights

    • Support TIN, CSN, SSN, NonLocal
    • Support FP16 training

    New Features

    • Support NonLocal module and provide ckpt in TSM and I3D (#41)
    • Support SSN (#33, #37, #52, #55)
    • Support CSN (#87)
    • Support TIN (#53)
    • Support HMDB51 dataset preparation (#60)
    • Support encoding videos from frames (#84)
    • Support FP16 training (#25)
    • Enhance demo by supporting rawframe inference (#59), output video/gif (#72)

    ModelZoo

    • Update Slowfast modelzoo (#51)
    • Update TSN, TSM video checkpoints (#50)
    • Add data benchmark for TSN (#57)
    • Add data benchmark for SlowOnly (#77)
    • Add BSN/BMN performance results with feature extracted by our codebase (#99)

    Improvements

    • Polish data preparation codes (#70)
    • Improve data preparation scripts (#58)
    • Improve unittest coverage and minor fix (#62)
    • Support PyTorch 1.6 in CI (#117)
    • Support with_offset for rawframe dataset (#48)
    • Support json annotation files (#119)
    • Support multi-class in TSMHead (#104)
    • Support using val_step() to validate data for each val workflow (#123)
    • Use xxInit() method to get total_frames and make total_frames a required key (#90)
    • Add paper introduction in model readme (#140)
    • Adjust the directory structure of tools/ and rename some scripts files (#142)

    Bug Fixes

    • Fix configs for localization test (#67)
    • Fix configs of SlowOnly by fixing lr to 8 gpus (#136)
    • Fix the bug in analyze_log (#54)
    • Fix the bug of generating HMDB51 class index file (#69)
    • Fix the bug of using load_checkpoint() in ResNet (#93)
    • Fix the bug of --work-dir when using slurm training script (#110)
    • Correct the sthv1/sthv2 rawframes filelist generate command (#71)
    • CosineAnnealing typo (#47)
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jul 21, 2020)

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install

HPC-AI Tech 7.1k Jan 03, 2023
Introduction to AI assignment 1 HCM University of Technology, term 211

Sokoban Bot Introduction to AI assignment 1 HCM University of Technology, term 211 Abstract This is basically a solver for Sokoban game using Breadth-

Quang Minh 4 Dec 12, 2022
This is an open solution to the Home Credit Default Risk challenge 🏡

Home Credit Default Risk: Open Solution This is an open solution to the Home Credit Default Risk challenge 🏡 . More competitions 🎇 Check collection

minerva.ml 427 Dec 27, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
Bot developed in Python that automates races in pegaxy.

español | português About it: This is a fork from pega-racing-bot. This bot, developed in Python, is to automate races in pegaxy. The game developers

4 Apr 08, 2022
DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Responsible Machine Learning With Great Power Comes Great Responsibility. Voltaire (well, maybe) How to develop machine learning models in a responsib

Model Oriented 590 Dec 26, 2022
Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks This repository contains the code that accompanies our CVPR 20

Despoina Paschalidou 161 Dec 20, 2022
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Katsuya Hyodo 6 May 15, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
A PyTorch implementation of "Signed Graph Convolutional Network" (ICDM 2018).

SGCN ⠀ A PyTorch implementation of Signed Graph Convolutional Network (ICDM 2018). Abstract Due to the fact much of today's data can be represented as

Benedek Rozemberczki 251 Nov 30, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022
Deep Learning to Create StepMania SM FIles

StepCOVNet Running Audio to SM File Generator Currently only produces .txt files. Use SMDataTools to convert .txt to .sm python stepmania_note_generat

Chimezie Iwuanyanwu 8 Jan 08, 2023
A simple Rock-Paper-Scissors game using CV in python

ML18_Rock-Paper-Scissors-using-CV A simple Rock-Paper-Scissors game using CV in python For IITISOC-21 Rules and procedure to play the interactive game

Anirudha Bhagwat 3 Aug 08, 2021
ICRA 2021 - Robust Place Recognition using an Imaging Lidar

Robust Place Recognition using an Imaging Lidar A place recognition package using high-resolution imaging lidar. For best performance, a lidar equippe

Tixiao Shan 293 Dec 27, 2022
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
Keyword-BERT: Keyword-Attentive Deep Semantic Matching

project discription An implementation of the Keyword-BERT model mentioned in my paper Keyword-Attentive Deep Semantic Matching (Plz cite this github r

1 Nov 14, 2021
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
Distinguishing Commercial from Editorial Content in News

Distinguishing Commercial from Editorial Content in News In this repository you can find the following: An anonymized version of the data used for my

Timo Kats 3 Sep 26, 2022
Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

yuexy 123 Jan 01, 2023
A Machine Teaching Framework for Scalable Recognition

MEMORABLE This repository contains the source code accompanying our ICCV 2021 paper. A Machine Teaching Framework for Scalable Recognition Pei Wang, N

2 Dec 08, 2021