Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Last update: Dec 11, 2022

Related tags

Deep Learning image-captioning

Overview

Dual-Level Collaborative Transformer for Image Captioning

This repository contains the reference code for the paper Dual-Level Collaborative Transformer for Image Captioning.

Experiment setup

please refer to m2 transformer

Data preparation

Annotation. Download the annotation file annotation.zip. Extarct and put it in the project root directory.
Feature. You can download our ResNeXt-101 feature (hdf5 file) here. Acess code: jcj6.
evaluation. Download the evaluation tools here. Acess code: jcj6. Extarct and put it in the project root directory.

There are five kinds of keys in our .hdf5 file. They are

['%d_features' % image_id]: region features (N_regions, feature_dim)
['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
['%d_grids' % image_id]: grid features (N_grids, feature_dim)
['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

We extract feature with the code in grid-feats-vqa.

The first three keys can be obtained when extracting region features with extract_region_feature.py. The forth key can be obtained when extracting grid features with code in grid-feats-vqa. The last key can be obtained with align.ipynb

Training

python train.py --exp_name dlct --batch_size 50 --head 8 --features_path ./data/coco_all_align.hdf5 --annotation annotation --workers 8 --rl_batch_size 100 --image_field ImageAllFieldWithMask --model DLCT --rl_at 17 --seed 118

Evaluation

python eval.py --annotation annotation --workers 4 --features_path ./data/coco_all_align.hdf5 --model_path path_of_model_to_eval --model DLCT --image_field ImageAllFieldWithMask --grid_embed --box_embed --dump_json gen_res.json --beam_size 5

Important args:

--features_path path to hdf5 file
--model_path
--dump_json dump generated captions to

Pretrained model is available here. Acess code: jcj6. By evaluating the pretrained model, you will get

{'BLEU': [0.8136727001615207, 0.6606095421082421, 0.5167535314080227, 0.39790755018790197], 'METEOR': 0.29522868252436046, 'ROUGE': 0.5914367650104326, 'CIDEr': 1.3382047139781112, 'SPICE': 0.22953477359195887}

References

[1] M2

[2] grid-feats-vqa

[3] butd

Acknowledgements

Thanks the original m2 and amazing work of grid-feats-vqa.

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Related tags

Overview

Dual-Level Collaborative Transformer for Image Captioning

Experiment setup

Data preparation

Training

Evaluation

References

Acknowledgements

Owner

lyricpoem

Evaluating AlexNet features at various depths

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization components are included and optional.

Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

A multi-entity Transformer for multi-agent spatiotemporal modeling.

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

CowHerd is a partially-observed reinforcement learning environment

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

PyTorch implementations of the beta divergence loss.

Knowledge Distillation Toolbox for Semantic Segmentation

A Moonraker plug-in for real-time compensation of frame thermal expansion

buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption

Canonical Capsules: Unsupervised Capsules in Canonical Pose (NeurIPS 2021)

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

Large scale PTM - PPI relation extraction