Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

  • Python 3.6
  • Java 15.0.2
  • PyTorch 1.2
  • numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

  1. Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
  2. Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

  1. Train our model without reinforcement learning, * can be activitynet or charades.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train
  1. Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th
  1. Train our model with key frames selection.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Owner
Yuqing Song
A student from RUC, major in CS.
Yuqing Song
(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Energy-based Latent Aligner for Incremental Learning Accepted to CVPR 2022 We illustrate an Incremental Learning model trained on a continuum of tasks

Joseph K J 37 Jan 03, 2023
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

Graph Convolutional Gated Recurrent Neural Network (GCGRNN) Improved from Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF

Lei Lin 21 Dec 18, 2022
Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Deepak Nandwani 1 Dec 31, 2021
Public repository created to store my custom-made tools for Just Dance (UbiArt Engine)

Woody's Just Dance Tools Public repository created to store my custom-made tools for Just Dance (UbiArt Engine) Development and updates Almost all of

Wodson de Andrade 8 Dec 24, 2022
HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow

Class HiddenMarkovModel HiddenMarkovModel implements hidden Markov models with Gaussian mixtures as distributions on top of TensorFlow 2.0 Installatio

Susara Thenuwara 2 Nov 03, 2021
I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

Tobias S 18 Dec 05, 2022
Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

AISys Lab 27 Jan 05, 2023
Python inverse kinematics for your robot model based on Pinocchio.

Python inverse kinematics for your robot model based on Pinocchio.

Stéphane Caron 50 Dec 22, 2022
🍷 Gracefully claim weekly free games and monthly content from Epic Store.

EPIC 免费人 🚀 优雅地领取 Epic 免费游戏 Introduction 👋 Epic AwesomeGamer 帮助玩家优雅地领取 Epic 免费游戏。 使用 「Epic免费人」可以实现如下需求: get:搬空游戏商店,获取所有常驻免费游戏与免费附加内容; claim:领取周免游戏及其免

571 Dec 28, 2022
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
Parametric Contrastive Learning (ICCV2021)

Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or

DV Lab 156 Dec 21, 2022
UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

UnpNet Citation Please cite the following paper if you use this repository in your reseach. @article {PMID:34914599, Title = {Rethinking 3-D LiDAR Po

Shijie Li 4 Jul 15, 2022
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 09, 2022
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

24 Dec 31, 2022
A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND 32 Jan 02, 2023
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
A simple python library for fast image generation of people who do not exist.

Random Face A simple python library for fast image generation of people who do not exist. For more details, please refer to the [paper](https://arxiv.

Sergei Belousov 170 Dec 15, 2022
Taichi Course Homework Template

太极图形课S1-标题部分 这个作业未来或将是你的开源项目,标题的内容可以来自作业中的核心关键词,让读者一眼看出你所完成的工作/做出的好玩demo 如果暂时未想好,起名时可以参考“太极图形课S1-xxx作业” 如下是作业(项目)展开说明的方法,可以帮大家理清思路,并且也对读者非常友好,请小伙伴们多多参

TaichiCourse 30 Nov 19, 2022
DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene. We achieve NeRF-comparable novel-view synthesis quality with super-fast convergence.

sunset 709 Dec 31, 2022