Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Last update: Oct 11, 2022

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

Python 3.6
Java 15.0.2
PyTorch 1.2
numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

Train our model without reinforcement learning, * can be activitynet or charades.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train

Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th

Train our model with key frames selection.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

Requirements

Training & Inference

Data preparation

Start training

Evaluation

Reference

Owner

Yuqing Song

BabelCalib: A Universal Approach to Calibrating Central Cameras. In ICCV (2021)

YOLOV4运行在嵌入式设备上

Deep Two-View Structure-from-Motion Revisited

Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

A python tutorial on bayesian modeling techniques (PyMC3)

🔀 Visual Room Rearrangement

Combinatorially Hard Games where the levels are procedurally generated

A modular domain adaptation library written in PyTorch.

Implementation for the "Surface Reconstruction from 3D Line Segments" paper.

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

catch-22: CAnonical Time-series CHaracteristics

Image Segmentation with U-Net Algorithm on Carvana Dataset using AWS Sagemaker

Fully Convlutional Neural Networks for state-of-the-art time series classification

Demonstration of the Model Training as a CI/CD System in Vertex AI

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

Pyramid Pooling Transformer for Scene Understanding

A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)