Meshed-Memory Transformer for Image Captioning. CVPR 2020

Overview

M²: Meshed-Memory Transformer

This repository contains the reference code for the paper Meshed-Memory Transformer for Image Captioning (CVPR 2020).

Please cite with the following BibTeX:

@inproceedings{cornia2020m2,
  title={{Meshed-Memory Transformer for Image Captioning}},
  author={Cornia, Marcella and Stefanini, Matteo and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Meshed-Memory Transformer

Environment setup

Clone the repository and create the m2release conda environment using the environment.yml file:

conda env create -f environment.yml
conda activate m2release

Then download spacy data by executing the following command:

python -m spacy download en

Note: Python 3.6 is required to run our code.

Data preparation

To run the code, annotations and detection features for the COCO dataset are needed. Please download the annotations file annotations.zip and extract it.

Detection features are computed with the code provided by [1]. To reproduce our result, please download the COCO features file coco_detections.hdf5 (~53.5 GB), in which detections of each image are stored under the _features key. is the id of each COCO image, without leading zeros (e.g. the for COCO_val2014_000000037209.jpg is 37209), and each value should be a (N, 2048) tensor, where N is the number of detections.

Evaluation

To reproduce the results reported in our paper, download the pretrained model file meshed_memory_transformer.pth and place it in the code folder.

Run python test.py using the following arguments:

Argument Possible values
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations

Expected output

Under output_logs/, you may also find the expected output of the evaluation code.

Training procedure

Run python train.py using the following arguments:

Argument Possible values
--exp_name Experiment name
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--m Number of memory vectors (default: 40)
--head Number of heads (default: 8)
--warmup Warmup value for learning rate scheduling (default: 10000)
--resume_last If used, the training will be resumed from the last checkpoint.
--resume_best If used, the training will be resumed from the best checkpoint.
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations
--logs_folder Path folder for tensorboard logs (default: "tensorboard_logs")

For example, to train our model with the parameters used in our experiments, use

python train.py --exp_name m2_transformer --batch_size 50 --m 40 --head 8 --warmup 10000 --features_path /path/to/features --annotation_folder /path/to/annotations

Sample Results

References

[1] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Owner
AImageLab
AImageLab
Pixel Consensus Voting for Panoptic Segmentation (CVPR 2020)

Implementation for Pixel Consensus Voting (CVPR 2020). This codebase contains the essential ingredients of PCV, including various spatial discretizati

Haochen 23 Oct 25, 2022
FactSeg: Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS)

FactSeg: Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery by Ailong Ma, Junjue Wang*, Yanfei Zhon

Kingdrone 43 Jan 05, 2023
Mercury: easily convert Python notebook to web app and share with others

Mercury Share your Python notebooks with others Easily convert your Python notebooks into interactive web apps by adding parameters in YAML. Simply ad

MLJAR 2.2k Dec 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 08, 2022
The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

TNDM - Targeted Neural Dynamical Modeling Note: This code is no longer being updated. The official re-implementation can be found at: https://github.c

1 Jul 21, 2022
Collections for the lasted paper about multi-view clustering methods (papers, codes)

Multi-View Clustering Papers Collections for the lasted paper about multi-view clustering methods (papers, codes). There also exists some repositories

Andrew Guan 10 Sep 20, 2022
Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)

Cross-Camera Convolutional Color Constancy, ICCV 2021 (Oral) Mahmoud Afifi1,2, Jonathan T. Barron2, Chloe LeGendre2, Yun-Ta Tsai2, and Francois Bleibe

Mahmoud Afifi 76 Jan 07, 2023
Generative Flow Networks for Discrete Probabilistic Modeling

Energy-based GFlowNets Code for Generative Flow Networks for Discrete Probabilistic Modeling by Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Vo

Narsil-Dinghuai Zhang 51 Dec 20, 2022
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
Generalized Data Weighting via Class-level Gradient Manipulation

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

18 Nov 12, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

EIP: Elastic Interaction of Particles Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception", in ACMMM (Oral) 2021. By Yikai

Yikai Wang 37 Dec 20, 2022
A library for implementing Decentralized Graph Neural Network algorithms.

decentralized-gnn A package for implementing and simulating decentralized Graph Neural Network algorithms for classification of peer-to-peer nodes. De

Multimedia Knowledge and Social Analytics Lab 5 Nov 07, 2022
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 07, 2022
yolov5 deepsort 行人 车辆 跟踪 检测 计数

yolov5 deepsort 行人 车辆 跟踪 检测 计数 实现了 出/入 分别计数。 默认是 南/北 方向检测,若要检测不同位置和方向,可在 main.py 文件第13行和21行,修改2个polygon的点。 默认检测类别:行人、自行车、小汽车、摩托车、公交车、卡车。 检测类别可在 detect

554 Dec 30, 2022
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

Autoregressive Image Generation using Residual Quantization (CVPR 2022) The official implementation of "Autoregressive Image Generation using Residual

Kakao Brain 529 Dec 30, 2022
The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

Watermark-Robustness-Toolbox - Official PyTorch Implementation This repository contains the official PyTorch implementation of the following paper to

49 Dec 19, 2022
SVG Icon processing tool for C++

BAWR This is a tool to automate the icons generation from sets of svg files into fonts and atlases. The main purpose of this tool is to add it to the

Frank David Martínez M 66 Dec 14, 2022
Hysterese plugin with two temperature offset areas

craftbeerpi4 plugin OffsetHysterese Temperatur-Steuerungs-Plugin mit zwei tempereaturbereich abhängigen Offsets. Installation sudo pip3 install https:

HappyHibo 1 Dec 21, 2021
Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

TwitterSuicideML Scripts for reproducing the Machine Learning analysis of the paper: Detecting Potentially Harmful and Protective Suicide-related Cont

3 Oct 17, 2022