Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Related tags

Computer VisionSMCG
Overview

SMCG

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Introduction

We investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence. Formally, given a video and a syntactically valid exemplar sentence, the task aims to generate one caption which not only describes the semantic contents of the video, but also follows the syntactic form of the given exemplar sentence. In order to tackle such an exemplar-based video captioning task, we propose a novel Syntax Modulated Caption Generator (SMCG) incorporated in an encoder-decoder-reconstructor architecture.

Dependency

  • python 2.7.2
  • torch 1.1.0
  • java openjdk version "10.0.2" 2018-07-17
  • StanfordCoreNLP

Download Features and Preprocess Data

For the MSRVTT dataset, please download the following files into the './msrvtt/msrvtt_data/' folder:

For the ActivityNet Captionsd dataset, please download the following files into the './activitynet/activitynet_data/' folder:

Data Preprocessing

  • Go to the './msrvtt/process_msrvtt_data/' folder, and run:
python prepro_vocab_parse_pos.py
python fill_template.py
  • Go to the './activitynet/process_activitynet_data/' folder, and run:
python prepro_anetcoco_vocab_pos_parse.py
python fill_template.py

Model Training and Testing

  • For the MSRVTT dataset, please go to the './msrvtt/src/' folder, and train the model by:
python train.py --gpu xx
  • For model inference and evaluation, run:
bash eval.sh 
bash control.sh 
  • Note: 'eval.sh' is used to evaluate the generated exemplar-based captions with conventional captioning metrics. 'control.sh' is used to compare the generated exemplar-based captions with the provided exemplar captions from the syntactic aspect, i.e., compute the edit distance between their parse trees.

  • For the ActivityNet Captions dataset, please go to the './activitynet/src/' folder, and train/test the model as on the MSRVTT dataset.

Citation

@inproceedings{yuan2020Control,
  title={Controllable Video Captioning with an Exemplar Sentence},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Zhu, Wenwu},
  booktitle={the 28th ACM International Conference on Multimedia (MM ’20)},
  year={2020}
}
Owner
doggy
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 04, 2023
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 03, 2023
Creating a virtual tv using opencv in python3.

Virtual-TV Creating a virtual tv using opencv in python3. In order to run the code follow the below given steps: Make sure the desired videos which ar

Vamsi 1 Jan 01, 2022
2 telegram-bots: for image recognition and for text generation

πŸ’» πŸ“± Telegram_Bots πŸ”Ž & πŸ“– 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
BNF Globalization Code (CVPR 2016)

Boundary Neural Fields Globalization This is the code for Boundary Neural Fields globalization method. The technical report of the method can be found

25 Apr 15, 2022
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Bruce Zhang 58 Jan 07, 2023
Shape Detection - It's a shape detection project with OpenCV and Python.

Shape Detection It's a shape detection project with OpenCV and Python. Setup pip install opencv-python for doing AI things. pip install simpleaudio fo

1 Nov 26, 2022
Fine tuning keras-ocr python package with custom synthetic dataset from scratch

OCR-Pipeline-with-Keras The keras-ocr package generally consists of two parts: a Detector and a Recognizer: Detector is responsible for creating bound

Eugene 1 Jan 05, 2022
Select range and every time the screen changes, OCR is activated.

ASOCR(Auto Screen OCR) Select range and every time you press Space key, OCR is activated. 範囲を選ぢと、あγͺγŸγŒγ‚ΉγƒšγƒΌγ‚Ήγ‚­γƒΌγ‚’ζŠΌγ™γŸγ³γ«γ€η”»ι’γŒε€‰γ‚γ‚‹εΊ¦γ«OCRγŒθ΅·ε‹•γ—γΎγ™γ€‚ usage1: simple OC

1 Feb 13, 2022
Balabobapy - Using artificial intelligence algorithms to continue the text

Balabobapy - Using artificial intelligence algorithms to continue the text

qxtony 1 Feb 04, 2022
A curated list of awesome synthetic data for text location and recognition

awesome-SynthText A curated list of awesome synthetic data for text location and recognition and OCR datasets. Text location SynthText SynthText_Chine

Tianzhong 283 Jan 05, 2023
A tool to enhance your old/damaged pictures built using python & opencv.

Breathe Life into your Old Pictures Table of Contents About The Project Getting Started Prerequisites Usage Contact Acknowledgments About The Project

Shah Anwaar Khalid 5 Dec 16, 2021
Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Rizky Dermawan 4 Mar 10, 2022
A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports

A buffered and threaded wrapper for the OpenCV VideoCapture object. Can speed up video decoding significantly. Supports "with"-syntax.

Patrice Matz 0 Oct 30, 2021
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
Demo processor to illustrate OCR-D Python API

ocrd_vandalize/ Demo processor to illustrate the OCR-D/core Python API Description :TODO: write docs :) Installation From PyPI pip3 install ocrd_vanda

Konstantin Baierer 5 May 05, 2022
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
Controlling the computer volume with your hands // OpenCV

HandsControll-AI Controlling the computer volume with your hands // OpenCV Step 1 git clone https://github.com/Hayk-21/HandsControll-AI.git pip instal

Hayk 1 Nov 04, 2021