Diverse Object-Scene Compositions For Zero-Shot Action Recognition

This repository contains the source code for the use of object-scene compositions for zero-shot action recognition.

This repository includes:

object and scene predictions for UCF-101, UCF-Sports, J-HMDB
script to retrieve object and scene predictions for Kinetics
scripts to obtain word and sentence embeddings for all datasets used and for object-scene compositions
script to obtain action predictions from any given action dataset, given the object and scene predictions and the respective action labels

Software used

python 3.8.8
pytorch 1.7.1
numpy 1.19.2
fasttext 0.9.2
sentence-transformers 1.2.0
scikit-learn 0.24.1

Downloading the object and scene predictions for Kinetics

While the action labels and video annotations for Kinetics are already present in the repo, the object and scene predictions need to be retrieved using:

bash kineticsdownload.sh

Obtaining word and sentence embeddings for all datasets

To compute the word and sentence embeddings for all the video and image datasets run:

python getfasttextembs.py; python getbertembs.py

This will additionally compute the embeddings for all object-scene compositions and the similarities between all action labels and objects-scene compositions.

Using the main script

The main script can be run using the default arguments as follows: To compute the word and sentence embeddings for all the video and image datasets run:

python zero-shot-actions.py

There are several flags that can be used. Descriptions for these can be shown by running:

python zero-shot-actions.py --help

Lastly, a helper function to compute results for different datasets and for different flag values is available:

python make_results.py

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Related tags

Overview

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Software used

Downloading the object and scene predictions for Kinetics

Obtaining word and sentence embeddings for all datasets

Using the main script

Owner

MonoRCNN is a monocular 3D object detection method for automonous driving

Adversarial-autoencoders - Tensorflow implementation of Adversarial Autoencoders

AdvStyle - Official PyTorch Implementation

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

Bolt Online Learning Toolbox

An self sufficient AI that crawls the web to learn how to generate art from keywords

Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Pytorch Implementation of Residual Vision Transformers(ResViT)

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

CKD - Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding

Dataset para entrenamiento de yoloV3 para 4 clases

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021]

D2Go is a toolkit for efficient deep learning

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

JittorVis - Visual understanding of deep learning models

MutualGuide is a compact object detector specially designed for embedded devices

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

Distributionally robust neural networks for group shifts

This code finds bounding box of a single human mouth.

Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".