Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Last update: Dec 17, 2022

Overview

Introduction

This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

@inproceedings{xlinear2020cvpr,
  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Requirements

Python 3
CUDA 10
numpy
tqdm
easydict
PyTorch (>1.0)
torchvision
coco-caption

Data preparation

Download the bottom up features and convert them to npz files

python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch
Download coco-caption and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
The pretrained models and results can be downloaded here.
The pretrained SENet-154 model can be downloaded here.

Training

Train X-LAN model

bash experiments/xlan/train.sh

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlan_rl/snapshot and run the script

bash experiments/xlan_rl/train.sh

Train X-LAN transformer model

bash experiments/xtransformer/train.sh

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script

bash experiments/xtransformer_rl/train.sh

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch

Acknowledgements

Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Related tags

Overview

Introduction

Requirements

Data preparation

Training

Train X-LAN model

Train X-LAN model using self critical

Train X-LAN transformer model

Train X-LAN transformer model using self critical

Evaluation

Acknowledgements

Owner

JDAI-CV

Contrastive Learning with Non-Semantic Negatives

CVPRW 2021: How to calibrate your event camera

Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks

Research on Event Accumulator Settings for Event-Based SLAM

PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)

Pytorch implementation of BRECQ, ICLR 2021

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

Neural network chess engine trained on Gary Kasparov's games.

Put blind watermark into a text with python

Code for "Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification", ECCV 2020 Spotlight

NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

Experiment about Deep Person Re-identification with EfficientNet-v2

Finetune SSL models for MOS prediction

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This repository contains the code for: RerrFact model for SciVer shared task

Provide partial dates and retain the date precision through processing