TVNet: Temporal Voting Network for Action Localization

Last update: Jul 26, 2022

Related tags

Overview

TVNet: Temporal Voting Network for Action Localization

This repo holds the codes of paper: "TVNet: Temporal Voting Network for Action Localization".

Paper Introduction

Temporal action localization is a vital task in video understranding. In this paper, we propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries.

Dependencies

Python == 2.7
Tensorflow == 1.9.0
CUDA==10.1.105
GCC >= 5.4

Note that the PEM code from BMN is implemented in Pytorch==1.1.0 or 1.3.0

Data Preparation

Datasets

Our experiments is based on ActivityNet 1.3 and THUMOS14 datasets.

Feature for THUMOS14

You can download the feature on THUMOS14 at here GooogleDrive.

Place it into a folder named thumos_features inside ./data.

You also need to download the feature for PEM (from BMN) at GooogleDrive. Please put it into a folder named Thumos_feature_hdf5 inside ./TVNet-THUMOS14/data/thumos_features.

If everything goes well, you can get the folder architecture of ./TVNet-THUMOS14/data like this:

data                       
└── thumos_features                    
		├── Thumos_feature_dim_400              
		├── Thumos_feature_hdf5               
		├── features_train.npy 
		└── features_test.npy

Feature for ActivityNet 1.3

You can download the feature on ActivityNet 1.3 at here GoogleCloud. Please put csv_mean_100 directory into ./TVNet-ANET/data/activitynet_feature_cuhk/.

If everything goes well, you can get the folder architecture of ./TVNet-ANET/data like this:

data                        
└── activitynet_feature_cuhk                    
		    └── csv_mean_100

Run all steps

Run all steps on THUMOS14

cd TVNet-THUMOS14

Run the following script with all steps on THUMOS14:

bash do_all.sh

Note: If you use BlueCrystal 4, you can directly run the following script without any dependencies setup.

bash do_all_BC4.sh

Run all steps on ActivityNet 1.3

cd TVNet-ANET
bash do_all.sh  or  bash do_all_BC4.sh

Run steps separately

Take TVNet-THUMOS14 as an example:

cd TVNet-THUMOS14

1. Temporal evaluation module

python TEM_train.py

python TEM_test.py

2. Creat training data for voting evidence module

python VEM_create_windows.py --window_length L --window_stride S

L is the window length and S is the sliding stride. We generate training windows for length 10 with stride 5, and length 5 with stride 2.

3. Voting evidence module

python VEM_train.py --voting_type TYPE --window_length L --window_stride S

python VEM_test.py --voting_type TYPE --window_length L --window_stride S

TYPE should be start or end. We train and test models with window length 10 (stride 5) and window length 5 (stride 2) for start and end separately.

4. Proposal evaluation module from BMN

python PEM_train.py

5. Proposal generation

python proposal_generation.py

6. Post processing and detection

python post_postprocess.py

Results

THUMOS14

tIoU	[email protected]
0.3	0.5724681814413137
0.4	0.5060844218403346
0.5	0.430414918823808
0.6	0.3297164845828022
0.7	0.202971546242546

ActivityNet 1.3

tIoU	[email protected]
Average	0.3460396513933088
0.5	0.5135151163296395
0.75	0.34955648726767025
0.95	0.10121803584836778

Reference

This implementation borrows from:

BSN: BSN-Boundary-Sensitive-Network

TEM_train/test.py -- for the TEM module we used in our paper
load_dataset.py -- borrow the part which load data for TEM

BMN: BMN-Boundary-Matching-Network

PEM_train.py -- for the PEM module we used in our paper

G-TAD: Sub-Graph Localization for Temporal Action Detection

post_postprocess.py -- for the multicore process to generate detection

Our main contribution is in:

VEM_create_windows.py -- generate training annotations for Voting Evidence Module (VEM)

VEM_train.py -- train Voting Evidence Module (VEM)

VEM_test.py -- test Voting Evidence Module (VEM)

TVNet: Temporal Voting Network for Action Localization

Related tags

Overview

TVNet: Temporal Voting Network for Action Localization

Paper Introduction

Dependencies

Data Preparation

Datasets

Feature for THUMOS14

Feature for ActivityNet 1.3

Run all steps

Run all steps on THUMOS14

Run all steps on ActivityNet 1.3

Run steps separately

1. Temporal evaluation module

2. Creat training data for voting evidence module

3. Voting evidence module

4. Proposal evaluation module from BMN

5. Proposal generation

6. Post processing and detection

Results

THUMOS14

ActivityNet 1.3

Reference

Owner

hywang

WiFi-based Multi-task Sensing

This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Rewrite ultralytics/yolov5 v6.0 opencv inference code based on numpy, no need to rely on pytorch

NeuroGen: activation optimized image synthesis for discovery neuroscience

OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

Code for KDD'20 "An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph"

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

Face Library is an open source package for accurate and real-time face detection and recognition

Predictive Maintenance LSTM

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Pytorch implementation of few-shot semantic image synthesis

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

disentanglement_lib is an open-source library for research on learning disentangled representations.

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.