Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Last update: Aug 27, 2022

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider
coco-caption
tensorboardX

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Related tags

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

Requirements

Training AAT

Prepare data (with python2)

Training

Evaluation

Reference

Acknowledgements

Owner

Lun Huang

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Platform-agnostic AI Framework 🔥

Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems.

Image based Human Fall Detection

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

A DCGAN to generate anime faces using custom mined dataset

Raindrop strategy for Irregular time series

A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

Collection of NLP model explanations and accompanying analysis tools

Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

CS550 Machine Learning course project on CNN Detection.

SpanNER: Named EntityRe-/Recognition as Span Prediction

Tensorflow implementation of "Learning Deep Features for Discriminative Localization"

Multiple-criteria decision-making (MCDM) with Electre, Promethee, Weighted Sum and Pareto