Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Last update: Jan 23, 2022

Related tags

Deep Learning Video-Captioning

Overview

Video-Captioning

A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video.

Approach

In our framework we use a sequence-to-sequence model to perform video visual relationship predictions where the input is a sequence of video frames and the output is a relation triplet < object1 − relationship − object2 > representing the videos. We extend the sequence-to-sequence modelling approach to an input of sequence of video frames.

Figure: Bidirectional LSTM layer (coloured red) encodes visual feature inputs, and the LSTM layer (coloured green) decodes the features into a sequence of words.

Results

Python Dependencies

Pandas
Keras
Tensorflow
Numpy
albumenations
Pillow

Procedure

Training

For training the model, run the script train.py.

  python train.py

For training on your own dataset: Save your data in a directory (for the format check the data folder). Update the json files.

object1_object2.json: It contains a dictionary for each object, with object labels as keys and ids as values.
relationship.json: It contains a dictionary for each relationship, with relationship labels as keys and ids as values.
training_annotations.json: It contains a dictionary for each video in the training data, with video ids as keys and a list of as values.

While running the script provide your directory path.

  python eval.py --train_data

Testing

For testing the model or making predictions on your own dataset, run the script eval.py.

  python eval.py --test_data

Result will be saved to a csv file 'test_data_predictions.csv'.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Related tags

Overview

Video-Captioning

Approach

Results

Python Dependencies

Procedure

Training

Testing

Owner

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Official implementation of DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations in TensorFlow 2

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

This repository contains tutorials for the py4DSTEM Python package

WSDM2022 "A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction"

git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21'

CLEAR algorithm for multi-view data association

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

YolactEdge: Real-time Instance Segmentation on the Edge

Transformer Huffman coding - Complete Huffman coding through transformer

Download and preprocess popular sequential recommendation datasets

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

Boston House Prediction Valuation Tool

Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2