Syntax-Aware Action Targeting for Video Captioning

Last update: Oct 13, 2022

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

Python 3.6
Pytorch 1.1
CUDA 10.0
Microsoft COCO Caption Evaluation
CIDEr

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (1.6GB). This folder contains:

input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
output/feature: extracted features of IRv2, C3D and Category embeddings
output/metadata: preprocessed annotations
output/model_svo/xe: model file and generated captions on test videos, the reported result can be reproduced by the model provided in this folder (CIDEr 49.1 for XE training)

Test

make -f SpecifiedMakefile test [options]

Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result

make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20

Train

To train the model using XE loss

make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20

If you want to change the input features, modify the FEATS variable in above commands.

Citation

@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgements

Pytorch implementation of CST
PyTorch implementation of SCST

Syntax-Aware Action Targeting for Video Captioning

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Dependencies

Data

Test

Train

Citation

Acknowledgements

Owner

Neural Fixed-Point Acceleration for Convex Optimization

Official code implementation for "Personalized Federated Learning using Hypernetworks"

✂️ EyeLipCropper is a Python tool to crop eyes and mouth ROIs of the given video.

Testing the Facial Emotion Recognition (FER) algorithm on animations

Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging"

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

A simple code to perform canny edge contrast detection on images.

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

Some methods for comparing network representations in deep learning and neuroscience.

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

A Dataset of Python Challenges for AI Research

Res2Net for Instance segmentation and Object detection using MaskRCNN

DeepFashion2 is a comprehensive fashion dataset.

This repo contains the official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data