AutoVideo: An Automated Video Action Recognition System

Overview

AutoVideo: An Automated Video Action Recognition System

Logo

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

There are some other video analysis libraries out there, but this one is designed to be highly modular. AutoVideo is highly extendible thanks to the pipeline language, where each model is wrapped as a primitive with some hyperparameters. This allows us to easily support other algorithms for other video analysis tasks, which will be our future efforts. It is also convenient to search models and hyperparameters with the pipeline language.

Demo

An overview of the library is shown as below. Each module in AutoVideo is wrapped as a primitive with some hyperparameters. A pipeline consists of a series of primitives from pre-processing to action recognition. AutoVideo is equipped with tuners to search models and hyperparameters. We welcome contributions to enrich AutoVideo with more primitives. You can find instructions in Contributing Guide.

Overview

Cite this work

If you find this repo useful, you may cite:

Zha, Daochen, et al. "AutoVideo: An Automated Video Action Recognition System." arXiv preprint arXiv:2108.0421 (2021).

@article{zha2021autovideo,
  title={AutoVideo: An Automated Video Action Recognition System},
  author={Zha, Daochen and Bhat, Zaid and Chen, Yi-Wei and Wang, Yicheng and Ding, Sirui and Jain, Anmoll and Bhat, Mohammad and Lai, Kwei-Herng and Chen, Jiaben and Zou, Na and Hu, Xia},
  journal={arXiv preprint arXiv:2108.04212},
  year={2021}
}

Installation

Make sure that you have Python 3.6 and pip installed. Currently the code is only tested in Linux system. First, install torch and torchvision with

pip3 install torch
pip3 install torchvision

To use the automated searching, you need to install ray-tune and hyperopt with

pip3 install 'ray[tune]' hyperopt

We recommend installing the stable version of autovideo with pip:

pip3 install autovideo

Alternatively, you can clone the latest version with

git clone https://github.com/datamllab/autovideo.git

Then install with

cd autovideo
pip3 install -e .

Toy Examples

To try the examples, you may download hmdb6 dataset, which is a subset of hmdb51 with only 6 classes. All the datasets can be downloaded from Google Drive. Then, you may unzip a dataset and put it in datasets.

Fitting and saving a pipeline

python3 examples/fit.py

Some important hyperparameters are as follows.

  • --alg: the supported algorithm. Currently we support tsn, tsm, i3d, eco, eco_full, c3d, r2p1d, and r3d.
  • --pretrained: whether loading pre-trained weights and fine-tuning.
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for sainge the log
  • --save_dir: the path for saving the fitted pipeline

Loading a fitted pipeline and producing predictions

After fitting a pipeline, you can load a pipeline and make predictions.

python3 examples/produce.py

Some important hyperparameters are as follows.

  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for saving the log
  • --load_dir: the path for loading the fitted pipeline

Loading a fitted pipeline and recogonizing actions

After fitting a pipeline, you can also make predicitons on a single video. As a demo, you may download the fitted pipeline and the demo video from Google Drive. Then, you can use the following command to recogonize the action in the video:

python3 examples/recogonize.py

Some important hyperparameters are as follows.

  • --gpu: which gpu device to use. Empty string for CPU.
  • --video_path: the path of video file
  • --log_dir: the path for saving the log
  • --load_dir: the path for loading the fitted pipeline

Fitting and producing a pipeline

Alternatively, you can do fit and produce without saving the model with

python3 examples/fit_produce.py

Some important hyperparameters are as follows.

  • --alg: the supported algorithm.
  • --pretrained: whether loading pre-trained weights and fine-tuning.
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for saving the log

Automated searching

In addition to running them by yourself, we also support automated model selection and hyperparameter tuning:

python3 examples/search.py

Some important hyperparameters are as follows.

  • --alg: the searching algorithm. Currently, we support random and hyperopt.
  • --num_samples: the number of samples to be tried
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset

Supported Algorithms

Algorithms Primitive Path Paper
TSN autovideo/recognition/tsn_primitive.py Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
TSM autovideo/recognition/tsm_primitive.py TSM: Temporal Shift Module for Efficient Video Understanding
R2P1D autovideo/recognition/r2p1d_primitive.py A Closer Look at Spatiotemporal Convolutions for Action Recognition
R3D autovideo/recognition/r3d_primitive.py Learning spatio-temporal features with 3d residual networks for action recognition
C3D autovideo/recognition/c3d_primitive.py Learning Spatiotemporal Features with 3D Convolutional Networks
ECO-Lite autovideo/recognition/eco_primitive.py ECO: Efficient Convolutional Network for Online Video Understanding
ECO-Full autovideo/recognition/eco_full_primitive.py ECO: Efficient Convolutional Network for Online Video Understanding
I3D autovideo/recognition/i3d_primitive.py Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Advanced Usage

Beyond the above examples, you can also customize the configurations.

Configuring the hypereparamters

Each model in AutoVideo is wrapped as a primitive, which contains some hyperparameters. An example of TSN is here. All the hyperparameters can be specified when building the pipeline by passing a config dictionary. See examples/fit.py.

Configuring the search space

The tuner will search the best hyperparamter combinations within a search sapce to improve the performance. The search space can be defined with ray-tune. See examples/search.py.

Preparing datasets and benchmarking

The datasets must follow d3m format, which consists of a csv file and a media folder. The csv file should have three columns to specify the instance indices, video file names and labels. An example is as below

d3mIndex,video,label
0,Aussie_Brunette_Brushing_Hair_II_brush_hair_u_nm_np1_ri_med_3.avi,0
1,brush_my_hair_without_wearing_the_glasses_brush_hair_u_nm_np1_fr_goo_2.avi,0
2,Brushing_my_waist_lenth_hair_brush_hair_u_nm_np1_ba_goo_0.avi,0
3,brushing_raychel_s_hair_brush_hair_u_cm_np2_ri_goo_2.avi,0
4,Brushing_Her_Hair__[_NEW_AUDIO_]_UPDATED!!!!_brush_hair_h_cm_np1_le_goo_1.avi,0
5,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_0.avi,0
6,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_1.avi,0
7,Prelinger_HabitPat1954_brush_hair_h_nm_np1_fr_med_26.avi,0
8,brushing_hair_2_brush_hair_h_nm_np1_ba_med_2.avi,0

The media folder should contain video files. You may refer to our example hmdb6 dataset in Google Drive. We have also prepared hmdb51 and ucf101 in the Google Drive for benchmarking. Please read benchmark for more details. For some of the algorithms (C3D, R2P1D and R3D), if you want to load the pre-trained weights and fine-tune, you need to download the weights from Google Drive and put it to weights.

Acknowledgement

We gratefully acknowledge the Data Driven Discovery of Models (D3M) program of the Defense Advanced Research Projects Agency (DARPA).

Comments
  • Problem with generating fitted timelines

    Problem with generating fitted timelines

    Hi all!

    I'm running into some problems with generating fitted pipelines for the different algorithms available. So I was trying to run the following command:

    python3 examples/fit.py --alg tsn --pretrained --gpu 0,1 --data_dir datasets/hmdb6/ --log_path logs/tsn.txt --save_path fittted_timelines/TSN/

    And I got the following output.

    --> Running on the GPU

    Initializing TSN with base model: resnet50. TSN Configurations: input_modality: RGB num_segments: 3 new_length: 1 consensus_module: avg dropout_ratio: 0.8

    Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /home/myuser/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth 100%|##########| 97.8M/97.8M [00:02<00:00, 40.4MB/s] Downloading: "https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmaction/models/kinetics400/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth" to /home/myuser/.cache/torch/hub/checkpoints/tsn2d_kinetics400_rgb_r50_seg3_f1s1-b702e12f.pth Traceback (most recent call last): File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1008, in _do_run_step self._run_step(step) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 998, in _run_step self._run_primitive(step) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 873, in _run_primitive multi_call_result = self._call_primitive_method(primitive.fit_multi_produce, fit_multi_produce_arguments) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 974, in _call_primitive_method raise error File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 970, in _call_primitive_method result = method(**arguments) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/primitive_interfaces/base.py", line 532, in fit_multi_produce return self._fit_multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, inputs=inputs, outputs=outputs) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/primitive_interfaces/base.py", line 559, in _fit_multi_produce fit_result = self.fit(timeout=timeout, iterations=iterations) File "/home/myuser/autovideo/autovideo/base/supervised_base.py", line 54, in fit self._init_model(pretrained = self.hyperparams['load_pretrained']) File "/home/myuser/autovideo/autovideo/recognition/tsn_primitive.py", line 206, in _init_model model_data = load_state_dict_from_url(pretrained_url) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/torch/hub.py", line 553, in load_state_dict_from_url download_url_to_file(url, cached_file, hash_prefix, progress=progress) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/torch/hub.py", line 419, in download_url_to_file u = urlopen(req) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 532, in open response = meth(req, response) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 570, in error return self._call_chain(*args) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/urllib/request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "examples/fit.py", line 61, in run(args) File "examples/fit.py", line 49, in run pipeline=pipeline) File "/home/myuser/autovideo/autovideo/utils/axolotl_utils.py", line 55, in fit raise pipeline_result.error File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1039, in _run self._do_run() File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1025, in _do_run self._do_run_step(step) File "/home/myuser/anaconda3/envs/autovideo/lib/python3.6/site-packages/d3m/runtime.py", line 1017, in _do_run_step ) from error d3m.exceptions.StepFailedError: Step 5 for pipeline e61792eb-f54b-44ae-931c-f0f965c5e9de failed.

    As you can see, I'm having problems with an Access Denied to the .pth files hosted at Amazon Cloud. Do you have any ideas on how to fix this?

    opened by viniciusarasantos 6
  • Running Predictions with pertained weights

    Running Predictions with pertained weights

    Hi,

    I'm trying to benchmark the hmdb51 and ucf101 datasets with the pertained weights available on Google Drive. I'm unfamiliar with axolotl library and am a little confused on how to populate fitted_pipeline['runtime'] if I don't try fitting using example/fit.py. Do you have any suggestions on how to accomplish this?

    Thank you, Rohita

    opened by nmochar2 2
  • About deprecated functions and current examples

    About deprecated functions and current examples

    opened by aendrs 1
  • AssertionError: assert os.path.exists(NO_SPLIT_TABULAR_SPLIT_PIPELINE_PATH)

    AssertionError: assert os.path.exists(NO_SPLIT_TABULAR_SPLIT_PIPELINE_PATH)

    I am trying to run the given example of hmbd6 but getting error :

    Traceback (most recent call last):
      File "examples/fit.py", line 56, in <module>
        run(args)
      File "examples/fit.py", line 20, in run
        from autovideo.utils import set_log_path, logger
      File "/content/autovideo/autovideo/__init__.py", line 4, in <module>
        from .utils import build_pipeline, fit, produce, fit_produce, produce_by_path, compute_accuracy_with_preds
      File "/content/autovideo/autovideo/utils/__init__.py", line 2, in <module>
        from .axolotl_utils import *
      File "/content/autovideo/autovideo/utils/axolotl_utils.py", line 12, in <module>
        from axolotl.backend.simple import SimpleRunner
      File "/usr/local/lib/python3.7/dist-packages/axolotl/backend/simple.py", line 5, in <module>
        from d3m import runtime as runtime_module
      File "/usr/local/lib/python3.7/dist-packages/d3m/runtime.py", line 23, in <module>
        from d3m.contrib import pipelines as contrib_pipelines
      File "/usr/local/lib/python3.7/dist-packages/d3m/contrib/pipelines/__init__.py", line 13, in <module>
        assert os.path.exists(NO_SPLIT_TABULAR_SPLIT_PIPELINE_PATH)
    AssertionError
    

    Running on Google colab. Code :

    !git clone https://github.com/datamllab/autovideo.git
    
    %cd autovideo
    !pip3 install -e .
    
    !gdown --id 1nLTjp6l6UucXEy8_eOM5Zj4Q1m79OhmT
    !unzip hmdb6.zip -d datasets
    
    !python3 examples/fit.py --alg tsn --data_dir datasets/hmdb6/ --gpu "cuda"
    

    How to resolve it?

    opened by akshay-gupta123 1
  • examples/recogonize.py does not work out of the box.

    examples/recogonize.py does not work out of the box.

    Minimum size of dataset is 4, I have the following hack in produce_by_path that works.

    # minimum size is 4
    dataset = {
        'd3mIndex': [0,1,2,3],
        'video': [video_name,video_name,video_name,video_name],
        'label': [0,0,0,0]
    }
    
    opened by danieltanfh95 3
  • Does not work with latest torch

    Does not work with latest torch

    works with torch==1.9.0 , torchvision==0.10.0 because torchvision has deprecated Scale in favour of Resize but d3m does not support it yet, so need to downgrade to torchvision<0.12.0 for this repo to work.

    opened by danieltanfh95 0
  • d3m exceptions StepFailedError

    d3m exceptions StepFailedError

    d3m.exceptions.StepFailedError: Step 7 for pipeline c43355b7-0e87-499f-a9f2-defc56b6713a failed

    I have trained this model using fit.py on your given dataset and saved weights in the weights directory than I run produce.py these two files run smoothly. But when I try to run recognize.py it gives me this exception.

    opened by muneebsaif 3
  • from autovideo import extract_frames is nor working

    from autovideo import extract_frames is nor working

    when i ran

    "from autovideo import extract_frames"

    I get following error

    "ImportError: cannot import name 'extract_frames' from 'autovideo' (/Volumes/Disk-Data/pose estimation/autovideo-main/autovideo/init.py)"

    opened by amitvermanit 10
  • Doubt about TSM temporal shift

    Doubt about TSM temporal shift

    Hi,

    First of all, I'd like to congratulate about this repo, we've found this very useful. While training TSM, we've discovered that the parameter is_shift is by default false. Also, the import there cannot be resolved since the original make_temporal_shift code is not integrated into this repo.

    Without is_shift enabled, does that mean that we're using a vanilla 2D Resnet50 and averaging the output of every input image in the sequence? Am I missing anything? The original contribution of TSM was this special temporal shift in the internal feature maps of any 2D CNN model.

    Thanks in advance.

    opened by alejandrosatis 1
Releases(1.2.1)
Owner
Data Analytics Lab at Texas A&M University
We develop automated and interpretable machine learning algorithms/systems with understanding of their theoretical properties.
Data Analytics Lab at Texas A&M University
Temporal-Relational CrossTransformers

Temporal-Relational Cross-Transformers (TRX) This repo contains code for the method introduced in the paper: Temporal-Relational CrossTransformers for

83 Dec 12, 2022
One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking This is an official implementation for NEAS presented in CVPR

Multimedia Research 19 Sep 08, 2022
Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

Splicing ViT Features for Semantic Appearance Transfer [Project Page] Splice is a method for semantic appearance transfer, as described in Splicing Vi

Omer Bar Tal 253 Jan 06, 2023
Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

Clara Meister 50 Nov 12, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
Bottleneck Transformers for Visual Recognition

Bottleneck Transformers for Visual Recognition Experiments Model Params (M) Acc (%) ResNet50 baseline (ref) 23.5M 93.62 BoTNet-50 18.8M 95.11% BoTNet-

Myeongjun Kim 236 Jan 03, 2023
A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

GRAAL/GRAIL 534 Dec 17, 2022
PyTorch implementation of Munchausen Reinforcement Learning based on DQN and SAC. Handles discrete and continuous action spaces

Exploring Munchausen Reinforcement Learning This is the project repository of my team in the "Advanced Deep Learning for Robotics" course at TUM. Our

Mohamed Amine Ketata 10 Mar 10, 2022
Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

Meta Research 99 Dec 06, 2022
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

3k Jan 08, 2023
Unofficial implementation of Fast-SCNN: Fast Semantic Segmentation Network

Fast-SCNN: Fast Semantic Segmentation Network Unofficial implementation of the model architecture of Fast-SCNN. Real-time Semantic Segmentation and mo

Philip Popien 69 Aug 11, 2022
RodoSol-ALPR Dataset

RodoSol-ALPR Dataset This dataset, called RodoSol-ALPR dataset, contains 20,000 images captured by static cameras located at pay tolls owned by the Ro

Rayson Laroca 45 Dec 15, 2022
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals This repo contains the Pytorch implementation of our paper: Unsupervised Seman

Wouter Van Gansbeke 335 Dec 28, 2022
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

Yihui He 1k Jan 03, 2023
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma 🔥 News 2021-10

Jingtao Zhan 99 Dec 27, 2022
DANet for Tabular data classification/ regression.

Deep Abstract Networks A pyTorch implementation for AAAI-2022 paper DANets: Deep Abstract Networks for Tabular Data Classification and Regression. Bri

Ronnie Rocket 55 Sep 14, 2022
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

MEAL-V2 This is the official pytorch implementation of our paper: "MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tric

Zhiqiang Shen 653 Dec 19, 2022
Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Code for Simulations in the Publication Can the brain use waves to solve planning problems? Installing Required Python Packages Please use Python vers

EMD Group 2 Jul 01, 2022