Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

This repository is the PyTorch implementation of the paper:

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Shweta Mahajan and Stefan Roth

We additionally include evaluation code from Luo et al. in the folder GoogleConceptualCaptioning , which has been patched for compatibility.

Requirements

The following code is written in Python 3.6.10 and CUDA 9.0.

Requirements:

  • torch 1.1.0
  • torchvision 0.3.0
  • nltk 3.5
  • inflect 4.1.0
  • tqdm 4.46.0
  • sklearn 0.0
  • h5py 2.10.0

To install requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed data

The dataset used in this project for assessing accuracy and diversity is COCO 2014 (m-RNN split). The full dataset is available here.

We use the Faster R-CNN features for images similar to Anderson et al.. We additionally require "classes"/"scores" fields detected for image regions. The classes correspond to Visual Genome.

Download instructions

Preprocessed training data is available here as hdf5 files. The provided hdf5 files contain the following fields:

  • image_id: ID of the COCO image
  • num_boxes: The proposal regions detected from Faster R-CNN
  • features: ResNet-101 features of the extracted regions
  • classes: Visual genome classes of the extracted regions
  • scores: Scores of the Visual genome classes of the extracted regions

Note that the ["image_id","num_boxes","features"] fields are identical to Anderson et al.

Create a folder named coco and download the preprocessed training and test datasets from the coco folder in the drive link above as follows (it is also possible to directly download the entire coco folder from the drive link):

  1. Download the following files for training on COCO 2014 (m-RNN split):
coco/coco_train_2014_adaptive_withclasses.h5
coco/coco_val_2014_adaptive_withclasses.h5
coco/coco_val_mRNN.txt
coco/coco_test_mRNN.txt
  1. Download the following files for training on held-out COCO (novel object captioning):
coco/coco_train_2014_noc_adaptive_withclasses.h5
coco/coco_train_extra_2014_noc_adaptive_withclasses.h5
  1. Download the following files for testing on held-out COCO (novel object captioning):
coco/coco_test_2014_noc_adaptive_withclasses.h5
  1. Download the (caption) annotation files and place them in a subdirectory coco/annotations (mirroring the Google drive folder structure)
coco/annotations/captions_train2014.json
coco/annotations/captions_val2014.json
  1. Download the following files from the drive link in a seperate folder data (outside coco). These files contain the contextual neighbours for pseudo supervision:
data/nn_final.pkl
data/nn_noc.pkl

For running the train/test scripts (described in the following) "pathToData"/"nn_dict_path" in params.json and params_noc.json needs to be set to the coco/data folder created above.

Verify Folder Structure after Download

The folder structure of coco after data download should be as follows,

coco
 - annotations
   - captions_train2014.json
   - captions_val2014.json
 - coco_val_mRNN.txt
 - coco_test_mRNN.txt
 - coco_train_2014_adaptive_withclasses.h5
 - coco_val_2014_adaptive_withclasses.h5
 - coco_train_2014_noc_adaptive_withclasses.h5
 - coco_train_extra_2014_noc_adaptive_withclasses.h5
 - coco_test_2014_noc_adaptive_withclasses.h5
data
 - coco_classname.txt
 - visual_genome_classes.txt
 - vocab_coco_full.pkl
 - nn_final.pkl
 - nn_noc.pkl

Training

Please follow the following instructions for training:

  1. Set hyperparameters for training in params.json and params_noc.json.
  2. Train a model on COCO 2014 for captioning,
   	python ./scripts/train.py
  1. Train a model for diverse novel object captioning,
   	python ./scripts/train_noc.py

Please note that the data folder provides the required vocabulary.

Memory requirements

The models were trained on a single nvidia V100 GPU with 32 GB memory. 16 GB is sufficient for training a single run.

Pre-trained models and evaluation

We provide pre-trained models for both captioning on COCO 2014 (mRNN split) and novel object captioning. Please follow the following steps:

  1. Download the pre-trained models from here to the ckpts folder.

  2. For evaluation of oracle scores and diversity, we follow Luo et al.. In the folder GoogleConceptualCaptioning download the cider and in the cococaption folder run the download scripts,

   	./GoogleConceptualCaptioning/cococaption/get_google_word2vec_model.sh
   	./GoogleConceptualCaptioning/cococaption/get_stanford_models.sh
   	python ./scripts/eval.py
  1. For diversity evaluation create the required numpy file for consensus re-ranking using,
   	python ./scripts/eval_diversity.py

For consensus re-ranking follow the steps here. To obtain the final diversity scores, follow the instructions of DiversityMetrics. Convert the numpy file to required json format and run the script evalscripts.py

  1. To evaluate the F1 score for novel object captioning,
   	python ./scripts/eval_noc.py

Results

Oracle evaluation on the COCO dataset

B4 B3 B2 B1 CIDEr METEOR ROUGE SPICE
COS-CVAE 0.633 0.739 0.842 0.942 1.893 0.450 0.770 0.339

Diversity evaluation on the COCO dataset

Unique Novel mBLEU Div-1 Div-2
COS-CVAE 96.3 4404 0.53 0.39 0.57

F1-score evaluation on the held-out COCO dataset

bottle bus couch microwave pizza racket suitcase zebra average
COS-CVAE 35.4 83.6 53.8 63.2 86.7 69.5 46.1 81.7 65.0

Bibtex

@inproceedings{coscvae20neurips,
  title     = {Diverse Image Captioning with Context-Object Split Latent Spaces},
  author    = {Mahajan, Shweta and Roth, Stefan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
}
Owner
Visual Inference Lab @TU Darmstadt
Visual Inference Lab @TU Darmstadt
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi

Fangyin 58 Dec 26, 2022
Conversion between units used in magnetism

convmag Conversion between various units used in magnetism The conversions between base units available are: T - G : 1e4

0 Jul 15, 2021
style mixing for animation face

An implementation of StyleGAN on Animation dataset. Install git clone https://github.com/MorvanZhou/anime-StyleGAN cd anime-StyleGAN pip install -r re

Morvan 46 Nov 30, 2022
[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples This repository is the official implementation of "Tow

Sungyoon Lee 4 Jul 12, 2022
Complete U-net Implementation with keras

U Net Lowered with Keras Complete U-net Implementation with keras Original Paper Link : https://arxiv.org/abs/1505.04597 Special Implementations : The

Sagnik Roy 14 Oct 10, 2022
Unet network with mean teacher for altrasound image segmentation

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022
Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

CARLA-Roach This is the official code release of the paper End-to-End Urban Driving by Imitating a Reinforcement Learning Coach by Zhejun Zhang, Alexa

Zhejun Zhang 118 Dec 28, 2022
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

AdaptationSeg This is the Python reference implementation of AdaptionSeg proposed in "Curriculum Domain Adaptation for Semantic Segmentation of Urban

Yang Zhang 128 Oct 19, 2022
The MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset This is the repository for Measuring Mathematical Problem Solving With the MATH Dataset b

Dan Hendrycks 267 Dec 26, 2022
A sequence of Jupyter notebooks featuring the 12 Steps to Navier-Stokes

CFD Python Please cite as: Barba, Lorena A., and Forsyth, Gilbert F. (2018). CFD Python: the 12 steps to Navier-Stokes equations. Journal of Open Sour

Barba group 2.6k Dec 30, 2022
g2o: A General Framework for Graph Optimization

g2o - General Graph Optimization Linux: Windows: g2o is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has bee

Rainer Kümmerle 2.5k Dec 30, 2022
A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS).

UniNAS A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS). under development (which happens mostly on our internal Gi

Cognitive Systems Research Group 19 Nov 23, 2022
Python implementation of Lightning-rod Agent, the Stack4Things board-side probe

Iotronic Lightning-rod Agent Python implementation of Lightning-rod Agent, the Stack4Things board-side probe. Free software: Apache 2.0 license Websit

2 May 19, 2022
TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

TargetAllDomainObjects A python wrapper to run a command on against all users/co

Podalirius 19 Dec 13, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
Centroid-UNet is deep neural network model to detect centroids from satellite images.

Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer

GIC-AIT 19 Dec 08, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Jan 08, 2023
PyTorch Implementation of AnimeGANv2

PyTorch implementation of AnimeGANv2

4k Jan 07, 2023
Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

Ubiquitous Knowledge Processing Lab 191 Dec 28, 2022
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

OstrichRL This is the repository accompanying the paper OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion. It contain

Vittorio La Barbera 51 Nov 17, 2022