Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Overview

Dual-Level Collaborative Transformer for Image Captioning

This repository contains the reference code for the paper Dual-Level Collaborative Transformer for Image Captioning.

Experiment setup

please refer to m2 transformer

Data preparation

  • Annotation. Download the annotation file annotation.zip. Extarct and put it in the project root directory.
  • Feature. You can download our ResNeXt-101 feature (hdf5 file) here. Acess code: jcj6.
  • evaluation. Download the evaluation tools here. Acess code: jcj6. Extarct and put it in the project root directory.

There are five kinds of keys in our .hdf5 file. They are

  • ['%d_features' % image_id]: region features (N_regions, feature_dim)
  • ['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
  • ['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
  • ['%d_grids' % image_id]: grid features (N_grids, feature_dim)
  • ['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

We extract feature with the code in grid-feats-vqa.

The first three keys can be obtained when extracting region features with extract_region_feature.py. The forth key can be obtained when extracting grid features with code in grid-feats-vqa. The last key can be obtained with align.ipynb

Training

python train.py --exp_name dlct --batch_size 50 --head 8 --features_path ./data/coco_all_align.hdf5 --annotation annotation --workers 8 --rl_batch_size 100 --image_field ImageAllFieldWithMask --model DLCT --rl_at 17 --seed 118

Evaluation

python eval.py --annotation annotation --workers 4 --features_path ./data/coco_all_align.hdf5 --model_path path_of_model_to_eval --model DLCT --image_field ImageAllFieldWithMask --grid_embed --box_embed --dump_json gen_res.json --beam_size 5

Important args:

  • --features_path path to hdf5 file
  • --model_path
  • --dump_json dump generated captions to

Pretrained model is available here. Acess code: jcj6. By evaluating the pretrained model, you will get

{'BLEU': [0.8136727001615207, 0.6606095421082421, 0.5167535314080227, 0.39790755018790197], 'METEOR': 0.29522868252436046, 'ROUGE': 0.5914367650104326, 'CIDEr': 1.3382047139781112, 'SPICE': 0.22953477359195887}

References

[1] M2

[2] grid-feats-vqa

[3] butd

Acknowledgements

Thanks the original m2 and amazing work of grid-feats-vqa.

Owner
lyricpoem
lyricpoem
Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

120 Dec 15, 2022
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

FAME: Feature-based Adversarial Meta-Embeddings This is the companion code for the experiments reported in the paper "FAME: Feature-Based Adversarial

Bosch Research 11 Nov 27, 2022
Data cleaning, missing value handle, EDA use in this project

Lending Club Case Study Project Brief Solving this assignment will give you an idea about how real business problems are solved using EDA. In this cas

Dhruvil Sheth 1 Jan 05, 2022
Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

TorchSeg This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch. Highlights Modular De

ycszen 1.4k Jan 02, 2023
Convert weight file.pth to weight file.blob

CONVERT YOUR MODEL TO IR FORMAT INSTALLATION OpenVino Toolkit Download openvinotoolkit 2021.3 version : Link Instruction of installation : Link Pytorc

Tran Anh Tuan 3 Nov 18, 2021
Code for the paper: Fighting Fake News: Image Splice Detection via Learned Self-Consistency

Fighting Fake News: Image Splice Detection via Learned Self-Consistency [paper] [website] Minyoung Huh *12, Andrew Liu *1, Andrew Owens1, Alexei A. Ef

minyoung huh (jacob) 174 Dec 09, 2022
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News December 27: v1.1.0 New loss functions: CentroidTripletLoss and VICRegLoss Mean reciprocal rank + per-class accuracies See the release notes Than

Kevin Musgrave 5k Jan 05, 2023
A simple editor for captions in .SRT file extension

WaySRT A simple editor for captions in .SRT file extension The program doesn't use any external dependecies, just run: python way_srt.py {file_name.sr

Gustavo Lopes 3 Nov 16, 2022
Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix Tuning Files: . ├── gpt2 # Code for GPT2 style autoregressive LM │ ├── train_e2e.py # high-level script

530 Jan 04, 2023
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

Sky Computing Introduction Sky Computing is a load-balanced framework for federated learning model parallelism. It adaptively allocate model layers to

HPC-AI Tech 72 Dec 27, 2022
Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images [ICCV 2021] © Mahmood Lab - This code is made avail

Mahmood Lab @ Harvard/BWH 63 Dec 01, 2022
GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs [Paper, Slides, Video Talk] at USENIX OSDI'21 @inproceedings{GNNAdvisor, title=

YUKE WANG 47 Jan 03, 2023
PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

Learning Character-Agnostic Motion for Motion Retargeting in 2D We provide PyTorch implementation for our paper Learning Character-Agnostic Motion for

Rundi Wu 367 Dec 22, 2022
This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Backtracking Project Sponsors This is a project made by UCU students: Olha Liuba - crossword solver implementation Hanna Yershova - sudoku solver impl

Dasha 4 Oct 17, 2021
Blind Video Temporal Consistency via Deep Video Prior

deep-video-prior (DVP) Code for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior PyTorch implementation | paper | project web

Chenyang LEI 272 Dec 21, 2022
A simple pygame dino game which can also be trained and played by a NEAT KI

Dino Game AI Game The game itself was developed with the Pygame module pip install pygame You can also play it yourself by making the dino jump with t

Kilian Kier 7 Dec 05, 2022
A deep learning model for style-specific music generation.

DeepJ: A model for style-specific music generation https://arxiv.org/abs/1801.00887 Abstract Recent advances in deep neural networks have enabled algo

Henry Mao 704 Nov 23, 2022
[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

DASR Pytorch implementation of "Unsupervised Degradation Representation Learning for Blind Super-Resolution", CVPR 2021 [arXiv] Overview Requirements

Longguang Wang 318 Dec 24, 2022
The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

ECIR Reproducibility Paper: Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study This code corresponds to the reproducibility

ielab 3 Mar 31, 2022