Towards Long-Form Video Understanding

Last update: Dec 26, 2022

Related tags

Deep Learning lvu

Overview

Towards Long-Form Video Understanding

Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021

[Paper] [Project Page] [Dataset]

Citation

@inproceedings{lvu2021,
  Author    = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
  Title     = {{Towards Long-Form Video Understanding}},
  Booktitle = {{CVPR}},
  Year      = {2021}}

Overview

This repo implements Object Transformers for long-form video understanding.

Getting Started

Please organize data/ as follows

data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0

ava, features, and instance_meta could be found at this Google Drive folder. lvu_1.0 can be found at here.

Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/.

Pre-training

python3 -u run_pretrain.py

This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).

Training and evaluating on AVA v2.2

python3 -u run_ava.py

This should achieve 31.0 mAP.

Training and evaluating on LVU tasks

python3 -u run.py [1-9]

The argument selects a task to run on. Please see run.py for details.

Acknowledgment

This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.

Towards Long-Form Video Understanding

Related tags

Overview

Towards Long-Form Video Understanding

[Paper] [Project Page] [Dataset]

Citation

Overview

Getting Started

Pre-training

Training and evaluating on AVA v2.2

Training and evaluating on LVU tasks

Acknowledgment

Owner

Chao-Yuan Wu

TDmatch is a Python library developed to perform matching tasks in three categories:

This repo is customed for VisDrone.

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Generative Flow Networks for Discrete Probabilistic Modeling

Meta-learning for NLP

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

Segment axon and myelin from microscopy data using deep learning

Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

Expressive Power of Invariant and Equivaraint Graph Neural Networks (ICLR 2021)

The Multi-Mission Maximum Likelihood framework (3ML)

The code is the training example of AAAI2022 Security AI Challenger Program Phase 8: Data Centric Robot Learning on ML models.

A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries.

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

IMBENS: class-imbalanced ensemble learning in Python.

OMAMO: orthology-based model organism selection

Project of 'TBEFN: A Two-branch Exposure-fusion Network for Low-light Image Enhancement '

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

This repository provides an efficient PyTorch-based library for training deep models.

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains