An unreferenced image captioning metric (ACL-21)

Last update: Nov 20, 2022

Related tags

Overview

UMIC

This repository provides an unferenced image captioning metric from our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning.
Here, we provide the code to compute UMIC.

Usage (Updating the Descriptions)

Our code is based on UNITER. Therefore, please follow the install guideline for using Docker to load UNITER. In the next few weeks, we try to release the version without using the docker.

1. Install Prerequisites

We used the Docker image provided by the official repo of UNITER. Using the guideline in the repo, please install the docker.

2. Download the Visual Features

For image captioning task, COCO dataset is widely used. To download the visual features for coco captions, just download the image features for coco validation splits using the following command.

wget https://acvrpublicycchen.blob.core.windows.net/uniter/img_db/coco_val2014.tar

Please refer to the offical repo of UNITER for downloading other visual features.

3. Pre-processing the Textual Features (Captions)

The format of textual feature file(python dictionary, json format) is as follows:
'cands' : [list of candidate captions]
'img_fs' : [list of image file names]

4. Running the Script

Launching Docker

source launch_activate.sh $PATH_TO_STORAGE

Compute Score

python compute_score.py --data_type capeval1k \
                              --ckpt /storage/umic.pt \
                              --img_type \ coco_val2014 \

Reference

If you find this repo useful, please consider citing:

@inproceedings{lee-etal-2021-umic,
    title = "{UMIC}: An Unreferenced Metric for Image Captioning via Contrastive Learning",
    author = "Lee, Hwanhee  and
      Yoon, Seunghyun  and
      Dernoncourt, Franck  and
      Bui, Trung  and
      Jung, Kyomin",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-short.29",
    doi = "10.18653/v1/2021.acl-short.29",
    pages = "220--226",
}

An unreferenced image captioning metric (ACL-21)

Related tags

Overview

UMIC

Usage (Updating the Descriptions)

1. Install Prerequisites

2. Download the Visual Features

3. Pre-processing the Textual Features (Captions)

4. Running the Script

Reference

Owner

hwanheelee

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

Constrained Language Models Yield Few-Shot Semantic Parsers

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Create and implement a deep learning library from scratch.

RL-driven agent playing tic-tac-toe on starknet against challengers.

Repo for parser tensorflow(.pb) and tflite(.tflite)

Baseline of DCASE 2020 task 4

[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

Multiview 3D object detection on MultiviewC dataset through moft3d.

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

Code for "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" @ICRA2021

Code base for "On-the-Fly Test-time Adaptation for Medical Image Segmentation"

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness

tsflex - feature-extraction benchmarking