Automatic Video Captioning Evaluation Metric --- EMScore

Last update: Nov 28, 2022

Related tags

Deep Learning emscore

Overview

Automatic Video Captioning Evaluation Metric --- EMScore

Overview

For an illustration, EMScore can be computed as:

Installation

modify the encode_text() function in CLIP/clip/model.py as follows:

def encode_text(self, text, local=False):
    x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]

    x = x + self.positional_embedding.type(self.dtype)
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.transformer(x)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.ln_final(x).type(self.dtype)

    if local:
        x = x @ self.text_projection
    else:
        # x.shape = [batch_size, n_ctx, transformer.width]
        # take features from the eot embedding (eot_token is the highest number in each sequence)
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
  
    return x

Push your modified CLIP to your GitHub.

Install

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/$Yours_GitHub_name/CLIP

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU.

Usage:

A general demo

python demo.py

VATEX-EVAL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1jAfZZKEgkMEYFF2x1mhYo39nH-TNeGm6?usp=sharing

run code

python VATEX-EVAL-demo.py --storage_path $storage_path --use_n_refs 1 --use_feat_cache --use_idf

ActivityNet-FOIL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1oY9EJiEi_db_1GH-R33JDqfE8txffKR3?usp=sharing

run code

python ActivityNet-FOIL_demo.py --storage_path $storage_path --use_references --use_idf

Others

if you want extract embeddings by yourself:

python extract_video_embeddings.py --videos_path $your_video_path  --save_path $your_storage_path --backbone 'ViT-B/32'

Automatic Video Captioning Evaluation Metric --- EMScore

Related tags

Overview

Overview

Installation

Usage:

A general demo

VATEX-EVAL

ActivityNet-FOIL

Others

Owner

Yaya Shi

Learning Representations that Support Robust Transfer of Predictors

This repo contains the code for paper Inverse Weighted Survival Games

Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

Efficient Training of Audio Transformers with Patchout

Practical tutorials and labs for TensorFlow used by Nvidia, FFN, CNN, RNN, Kaggle, AE

Breaking the Dilemma of Medical Image-to-image Translation

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

CS50's Introduction to Artificial Intelligence Test Scripts

Code for models used in Bashiri et al., "A Flow-based latent state generative model of neural population responses to natural images".

Package for working with hypernetworks in PyTorch.

gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Pytorch implementation of MLP-Mixer with loading pre-trained models.

Multi-Glimpse Network With Python

Official repository for the ICCV 2021 paper: UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model.

A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Testing and Estimation of structural breaks in Stata

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility