Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

Related tags

Deep LearningRSPNet
Overview

RSPNet

Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning"

[Supplementary Materials]

Getting Started

Install Dependencies

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.6. Other versions should work but are not tested.

Transcode Videos (Optional)

This step is optional but will increase the data loading speed dramatically.

We decode the videos on the fly while training so we don't need to split frames. This makes disk IO a lot faster but increases CPU usage. This transcode step aims at reducing CPU consumed by decoding by 1) lower video resolution. 2) add more key frames.

To perform transcode, you need to have ffmpeg installed, then run:

python utils/transcode_dataset.py PATH/TO/ORIGIN_VIDEOS PATH/TO/TRANSCODED_VIDEOS

Be warned, this will use all your CPU and will take several hours (on our Intel E5-2630 *2 workstation) to complete.

Prepare Datasets

Your are expected to prepare date for pre-training (Kinetics-400 dataset) and fine-tuning (UCF101, HMDB51 and Something-something-v2 datasets). To let the scripts find datasets on your system, the recommended way is to create symbolic links in ./data directory to the actual path. We found this solution flexible.

The expected directory hierarchy is as follow:

├── data
│   ├── hmdb51
│   │   ├── metafile
│   │   │   ├── brush_hair_test_split1.txt
│   │   │   └── ...
│   │   └── videos
│   │       ├── brush_hair
│   │       │   └── *.avi
│   │       └── ...
│   ├── UCF101
│   │   ├── ucfTrainTestlist
│   │   │   ├── classInd.txt
│   │   │   ├── testlist01.txt
│   │   │   ├── trainlist01.txt
│   │   │   └── ...
│   │   └── UCF-101
│   │       ├── ApplyEyeMakeup
│   │       │   └── *.avi
│   │       └── ...
│   ├── kinetics400
│   │   ├── train_video
│   │   │   ├── answering_questions
│   │   │   │   └── *.mp4
│   │   │   └── ...
│   │   └── val_video
│   │       └── (same as train_video)
│   ├── kinetics100
│   │   └── (same as kinetics400)
│   └── smth-smth-v2
│       ├── 20bn-something-something-v2
│       │   └── *.mp4
│       └── annotations
│           ├── something-something-v2-labels.json
│           ├── something-something-v2-test.json
│           ├── something-something-v2-train.json
│           └── something-something-v2-validation.json
└── ...

Alternatively, you can change the path in config/dataset to match your system.

Build Kinetics-100 dataset (Optional)

Some of our ablation study experiments use the Kinetics-100 dataset for pre-training. This dataset is built by extract 100 classes from Kinetics-400, which has the smallest file size on the train set.

If you have Kinetics-400 available, you can build Kinetics-100 by:

python -m utils.build_kinetics_subset

This script will create symbolic links instead of copy data. It is expected to complete in a minute.

We have included a pre-built one at data/kinetics100_links and created the symbolic link data/kinetics100 that related to it. You need to have data/kinetics400 available at runtime.

Pre-training on Pretext Tasks

Now you have set up the environment. Run the following command to pre-train your models on pretext tasks.

export CUDA_VISIBLE_DEVICES=0,1,2,3
# Architecture: C3D
python pretrain.py -e exps/pretext-c3d -c config/pretrain/c3d.jsonnet
# Architecture: ResNet-18
python pretrain.py -e exps/pretext-resnet18 -c config/pretrain/resnet18.jsonnet
# Architecture: S3D-G
python pretrain.py -e exps/pretext-s3dg -c config/pretrain/s3dg.jsonnet
# Architecture: R(2+1)D
python pretrain.py -e exps/pretext-r2plus1d -c config/pretrain/r2plus1d.jsonnet

You can use kinetics100 dataset for training by editing config/pretrain/moco-train-base.jsonnet (line 13)

Action Recognition

After pre-trained on pretext tasks, these models are fine-tuned to perform action recognition task on UCF101, HMDB51 and Something-something-v2 datasets.

export CUDA_VISIBLE_DEVICES=0,1
# Dataset: UCF101
#     Architecture: C3D [email protected]=76.71%
python finetune.py -c config/finetune/ucf101_c3d.jsonnet \
                   --mc exps/pretext-c3d/model_best.pth.tar \
                   -e exps/ucf101-c3d
#     Architecture: ResNet-18 [email protected]=74.33%
python finetune.py -c config/finetune/ucf101_resnet18.jsonnet \
                   --mc exps/pretext-resnet18/model_best.pth.tar \
                   -e exps/ucf101-resnet18
#     Architecture: S3D-G [email protected]=89.9%
python finetune.py -c config/finetune/ucf101_s3dg.jsonnet \
                   --mc exps/pretext-s3dg/model_best.pth.tar \
                   -e exps/ucf101-s3dg
#     Architecture: R(2+1)D [email protected]=81.1%
python finetune.py -c config/finetune/ucf101_r2plus1d.jsonnet \
                   --mc exps/pretext-r2plus1d/model_best.pth.tar \
                   -e exps/ucf101-r2plus1d

# Dataset: HMDB51
#     Architecture: C3D [email protected]=44.58%
python finetune.py -c config/finetune/hmdb51_c3d.jsonnet \
                   --mc exps/pretext-c3d/model_best.pth.tar \
                   -e exps/hmdb51-c3d
#     Architecture: ResNet-18 [email protected]=41.83%
python finetune.py -c config/finetune/hmdb51_resnet18.jsonnet \
                   --mc exps/pretext-resnet18/model_best.pth.tar \
                   -e exps/hmdb51-resnet18
#     Architecture: S3D-G [email protected]=59.6%
python finetune.py -c config/finetune/hmdb51_s3dg.jsonnet \
                   --mc exps/pretext-s3dg/model_best.pth.tar \
                   -e exps/hmdb51-s3dg
#     Architecture: R(2+1)D [email protected]=44.6%
python finetune.py -c config/finetune/hmdb51_r2plus1d.jsonnet \
                   --mc exps/pretext-r2plus1d/model_best.pth.tar \
                   -e exps/hmdb51-r2plus1d

# Dataset: Something-something-v2
#     Architecture: C3D [email protected]=47.76%
python finetune.py -c config/finetune/smth_smth_c3d.jsonnet \
                   --mc exps/pretext-c3d/model_best.pth.tar \
                   -e exps/smthv2-c3d
#     Architecture: ResNet-18 [email protected]=44.02%
python finetune.py -c config/finetune/smth_smth_resnet18.jsonnet \
                   --mc exps/pretext-resnet18/model_best.pth.tar \
                   -e exps/smthv2-resnet18
#     Architecture: S3D-G [email protected]=55.03%
python finetune.py -c config/finetune/smth_smth_s3dg.jsonnet \
                   --mc exps/pretext-s3dg/model_best.pth.tar \
                   -e exps/smthv2-s3dg

Results and Pre-trained Models

Architecture Pre-trained dataset Pre-training epoch Pre-trained model Acc. on UCF101 Acc. on HMDB51
S3D-G Kinetics-400 1000 Download link 93.7 64.7
S3D-G Kinetics-400 200 Download link 89.9 59.6
R(2+1)D Kinetics-400 200 Download link 81.1 44.6
ResNet-18 Kinetics-400 200 Download link 74.3 41.8
C3D Kinetics-400 200 Download link 76.7 44.6

Video Retrieval

The pretrained model can also be used in searching relevant videos based on the given query video.

export CUDA_VISIBLE_DEVICES=0 # use single GPU 
python retrieval.py -c config/retrieval/ucf101_resnet18.jsonnet \
                    --mc exps/pretext-resnet18/model_best.pth.tar \
                    -e exps/retrieval-resnet18    

The video retrieval result in our paper

Architecture k=1 k=5 k=10 k=20 k=50
C3D 36.0 56.7 66.5 76.3 87.7
ResNet-18 41.1 59.4 68.4 77.8 88.7

Visualization

We further visualize the region of interest (RoI) that contributes most to the similarity score using the class activation map (CAM) technique.

export CUDA_VISIBLE_DEVICES=0,1
python visualization.py -c config/pretrain/s3dg.jsonnet \
                        --load-model exps/pretext-s3dg/model_best.pth.tar \
                        -e exps/visual-s3dg \
                        -x '{batch_size: 1}'

The cam visualization results will be plotted in png files like

Troubleshoot

  • DECORDError cannot find video stream with wanted index: -1

    Some video from Kinetics dataset does not contain a valid video stream for some unknown reason. To filter them out, run python utils/verify_video.py PATH/TO/VIDEOS, then copy the output to the blacklist config in config/dataset/kinetics{400,100}.libsonnet. You need to have ffmpeg installed.

Citation

Please cite the following paper if you feel RSPNet useful to your research

@InProceedings{chen2020RSPNet,
author = {Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, and Chuang Gan},
title = {RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning},
booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
year = {2021}
}

Contact

For any question, please file an issue or contact

Peihao Chen: [email protected]
Deng Huang: [email protected]
Comments
  • r(2+1) d -18 pretrained model not fully reproducible

    r(2+1) d -18 pretrained model not fully reproducible

    Hi, I finetuned the given pre-trained r(2+1)d model on ucf-101 using the given finetuning code. It only achieves (76 -77%) accuracy. Can you confirm if the given model is the correct one. I use the same setup as mentioned in the readme.

    opened by fmthoker 3
  • framework image

    framework image

    hello, thank you for your great work. it's so smart idea!

    can you explain about framework image? i understand about RSP task, A-VID task is learned in 1 iteration. i think that it means 'anchor is same'. and i saw the algorithm, just sampling K clips in video V\v+, however, in paper fig 2. two clips in video, 1x clip and 2x clip 's features(green color) are going to g_a header and do contrastive learning. i think about you want to show us randomly selected speed.... is right? in real experiment, just c_i, c_j, {c_n}(K) clips in there? not 2K?

    thank you

    opened by youwantsy 2
  • The pre-training model of s3d-g model based on Imagenet and dynamics-400 data set?

    The pre-training model of s3d-g model based on Imagenet and dynamics-400 data set?

    Where can I download the pre training model of s3d-g model based on Imagenet and dynamics-400 dataset? Or can you upload it to this repository? 请问哪里可以下载到基于ImageNet和Kinetics-400数据集的S3D-G模型的预训练模型?或者请问作者可以上传一下公开吗?

    opened by LiangSiyv 2
  • Question about computational resources

    Question about computational resources

    Hi, Thanks for your wonderful paper and code. I want to know the computational resources of your experiments. 1. What and how many GPUs you use? 2. The training time of pretraining on K400 for 200 epochs. 3. The training time of finetuning on UCF101, HMDB51, Something-V2, respectively. Looking forward to your reply. Thanks.

    opened by wjn922 2
  • 'No configuration setting found for key force_n_crop'

    'No configuration setting found for key force_n_crop'

    I downloaded your S3D-G pre-trained model for my action recognition task on UCF101 but I keep getting this error:

    argument type: <class 'str'> Setting ulimit -n 8192 world_size=1 Using dist_url=tcp://127.0.0.1:36879 Local Rank: 0 2021-12-30 07:31:39,148|INFO |Args = Args(parser=None, config='config/finetune/ucf101_s3dg.jsonnet', ext_config=[], debug=False, experiment_dir=PosixPath('exps/ucf101-s3dg'), _run_dir=PosixPath('exps/ucf101-s3dg/run_2_20211230_073138'), load_checkpoint=None, load_model=None, validate=False, moco_checkpoint='exps/pretext-s3dg/model_best_s3dg_200epoch.pth.tar', seed=None, world_size=1, _continue=False, no_scale_lr=False) 2021-12-30 07:31:39,149|INFO |cudnn.benchmark = True 2021-12-30 07:31:39,278|INFO |Config = batch_size = 4 dataset { annotation_path = "data/UCF101/ucfTrainTestlist" fold = 1 mean = [ 0.485 0.456 0.406 ] name = "ucf101" num_classes = 101 root = "data/UCF101/UCF-101" std = [ 0.229 0.224 0.225 ] } final_validate { batch_size = 4 } log_interval = 10 method = "from-scratch" model { arch = "s3dg" } model_type = "multitask" num_epochs = 50 num_workers = 8 optimizer { dampening = 0 lr = 0.005 milestones = [ 50 100 150 ] momentum = 0.9 nesterov = false patience = 10 schedule = "cosine" weight_decay = 0.0001 } spatial_transforms { color_jitter { brightness = 0 contrast = 0 hue = 0 saturation = 0 } crop_area { max = 1 min = 0.25 } gray_scale = 0 size = 224 } temporal_transforms { frame_rate = 25 size = 64 strides = [ { stride = 1 weight = 1 } ] validate { final_n_crop = 10 n_crop = 1 stride = 1 } } validate { batch_size = 4 } 2021-12-30 07:31:39,282|INFO |Using global get_model_class({'arch': 's3dg'}) 2021-12-30 07:31:39,283|INFO |Using MultiTask Wrapper 2021-12-30 07:31:39,283|WARNING |<class 'moco.split_wrapper.MultiTaskWrapper'> using groups: 1 2021-12-30 07:31:39,383|INFO |Found fc: fc with in_features: 1024 2021-12-30 07:31:42,488|INFO |Building Dataset: VID: False, Split=train 2021-12-30 07:31:42,488|INFO |Temporal transform type: clip Traceback (most recent call last): File "finetune.py", line 502, in main() File "finetune.py", line 498, in main mp.spawn(main_worker, args=(args, dist_url,), nprocs=args.world_size) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/ubuntu/RSPNet/finetune.py", line 452, in main_worker engine = Engine(args, cfg, local_rank=local_rank) File "/home/ubuntu/RSPNet/finetune.py", line 171, in init self.train_loader = self.data_loader_factory.build( File "/home/ubuntu/RSPNet/datasets/classification/init.py", line 81, in build temporal_transform = self.get_temporal_transform(split) File "/home/ubuntu/RSPNet/datasets/classification/init.py", line 276, in get_temporal_transform if tt_cfg.get_bool("force_n_crop"): File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 310, in get_bool string_value = self.get_string(key, default) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 221, in get_string value = self.get(key, default) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 209, in get return self._get(ConfigTree.parse_key(key), 0, default) File "/home/ubuntu/anaconda3/envs/ucf101/lib/python3.8/site-packages/pyhocon/config_tree.py", line 151, in _get raise ConfigMissingException(u"No configuration setting found for key {key}".format(key='.'.join(key_path[:key_index + 1]))) pyhocon.exceptions.ConfigMissingException: 'No configuration setting found for key force_n_crop'

    opened by aloma85 0
Releases(pretrained_model)
Implementation of popular bandit algorithms in batch environments.

batch-bandits Implementation of popular bandit algorithms in batch environments. Source code to our paper "The Impact of Batch Learning in Stochastic

Danil Provodin 2 Sep 11, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 08, 2022
Contenido del curso Bases de datos del DCC PUC versión 2021-2

IIC2413 - Bases de Datos Tabla de contenidos Equipo Profesores Ayudantes Contenidos Calendario Evaluaciones Resumen de notas Foro Política de integrid

54 Nov 23, 2022
Illuminated3D This project participates in the Nasa Space Apps Challenge 2021.

Illuminated3D This project participates in the Nasa Space Apps Challenge 2021.

Eleftheriadis Emmanouil 1 Oct 09, 2021
Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.

OCR Ground Truth for Historical Commentaries The dataset OCR ground truth for historical commentaries (GT4HistComment) was created from the public dom

Ajax Multi-Commentary 3 Sep 08, 2022
Spectralformer: Rethinking hyperspectral image classification with transformers

The code in this toolbox implements the "Spectralformer: Rethinking hyperspectral image classification with transformers". More specifically, it is detailed as follow.

Danfeng Hong 104 Jan 04, 2023
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 07, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 08, 2023
RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

RLMeta rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib Installation To build fro

Meta Research 281 Dec 22, 2022
Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Caffe SegNet This is a modified version of Caffe which supports the SegNet architecture As described in SegNet: A Deep Convolutional Encoder-Decoder A

Alex Kendall 1.1k Jan 02, 2023
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
Sentinel-1 vessel detection model used in the xView3 challenge

sar_vessel_detect Code for the AI2 Skylight team's submission in the xView3 competition (https://iuu.xview.us) for vessel detection in Sentinel-1 SAR

AI2 6 Sep 10, 2022
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 08, 2023
Algorithmic Trading using RNN

Deep-Trading This an implementation adapted from Rachnog Neural networks for algorithmic trading. Part One — Simple time series forecasting and this c

Hazem Nomer 29 Sep 04, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

697 Jan 06, 2023
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

English | 简体中文 PaddleGAN PaddleGAN provides developers with high-performance implementation of classic and SOTA Generative Adversarial Networks, and s

6.4k Jan 09, 2023
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target i

NanYoMy 13 Oct 09, 2022
一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

定时面板上的签到盒 一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 特别声明 本仓库发布的脚本及其中涉及的任何解锁和解密分析脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合

Leon 1.1k Dec 30, 2022
CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes. CHERRY is based on a deep learning model, which consists of a graph convolutional encoder and a link

Kenneth Shang 12 Dec 15, 2022
Inference pipeline for our participation in the FeTA challenge 2021.

feta-inference Inference pipeline for our participation in the FeTA challenge 2021. Team name: TRABIT Installation Download the two folders in https:/

Lucas Fidon 2 Apr 13, 2022