Yet another video caption

Last update: May 26, 2022

Related tags

Deep Learning yet-another-video-caption

Overview

yet-another-video-caption

数据集配置

准备数据集

将原始数据集重新组织成统一的格式后，放置于 ./dataset 中。

数据集的组织格式为：

./dataset
    train/
        video/
            *.avi
        ...
        info.json
    test/
        video/ 
            *.avi
        ...

自动配置

通常你只需要使用数据集的一个子集，此时请考虑运行自动抽取脚本 makedata.py。

所有数据位于 ./data 中。

所有视频（包括 train/val/test）位于 ./data/video 中。

所有视频信息（包括 train/val/test）输入到 ./data/input.json。

程序会在 ./data 中产生一些中间信息，请勿修改。

依赖

pip install tqdm pillow pretrainedmodels nltk

此外，请确保已当前环境下已经正确配置 CUDA 运行库，CUDNN，Pytorch(GPU)，ffmpeg，JDK

食用步骤

确保数据集已正确配置
确保依赖已经正确安装
抽取数据，将你希望使用的 train/val/test 划分参数输入 makedata.py 中，然后执行该脚本
依次执行（请自行修改 batch_size 和 saved_model 参数！）

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152
python prepro_vocab.py
python train.py --epochs 3001 --batch_size 1 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 0 --dim_vid 2048
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_10.pth --batch_size 1

速度测试

以下结果测试于单张 2080Ti

预处理（ResNet152 特征提取）：共 40min

训练速度（batch_size=32）：6.20 it/s

Todo

大小写问题

References

https://github.com/xiadingZ/video-caption.pytorch

Yet another video caption

Related tags

Overview

yet-another-video-caption

数据集配置

准备数据集

自动配置

依赖

食用步骤

速度测试

Todo

References

Owner

Fan Zhimin

Kaggle Ultrasound Nerve Segmentation competition [Keras]

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

Attention-guided gan for synthesizing IR images

Combining Diverse Feature Priors

Minimal PyTorch implementation of YOLOv3

Dynamic View Synthesis from Dynamic Monocular Video

Face Recognition & AI Based Smart Attendance Monitoring System.

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

How to train a CNN to 99% accuracy on MNIST in less than a second on a laptop

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Parameter-ensemble-differential-evolution - Shows how to do parameter ensembling using differential evolution.

Preprossing-loan-data-with-NumPy - In this project, I have cleaned and pre-processed the loan data that belongs to an affiliate bank based in the United States.

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Sharing of contents on mitochondrial encounter networks

Google Recaptcha solver.

Alternatives to Deep Neural Networks for Function Approximations in Finance

Self-Supervised Deep Blind Video Super-Resolution