Meshed-Memory Transformer for Image Captioning. CVPR 2020

Overview

M²: Meshed-Memory Transformer

This repository contains the reference code for the paper Meshed-Memory Transformer for Image Captioning (CVPR 2020).

Please cite with the following BibTeX:

@inproceedings{cornia2020m2,
  title={{Meshed-Memory Transformer for Image Captioning}},
  author={Cornia, Marcella and Stefanini, Matteo and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Meshed-Memory Transformer

Environment setup

Clone the repository and create the m2release conda environment using the environment.yml file:

conda env create -f environment.yml
conda activate m2release

Then download spacy data by executing the following command:

python -m spacy download en

Note: Python 3.6 is required to run our code.

Data preparation

To run the code, annotations and detection features for the COCO dataset are needed. Please download the annotations file annotations.zip and extract it.

Detection features are computed with the code provided by [1]. To reproduce our result, please download the COCO features file coco_detections.hdf5 (~53.5 GB), in which detections of each image are stored under the _features key. is the id of each COCO image, without leading zeros (e.g. the for COCO_val2014_000000037209.jpg is 37209), and each value should be a (N, 2048) tensor, where N is the number of detections.

Evaluation

To reproduce the results reported in our paper, download the pretrained model file meshed_memory_transformer.pth and place it in the code folder.

Run python test.py using the following arguments:

Argument Possible values
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations

Expected output

Under output_logs/, you may also find the expected output of the evaluation code.

Training procedure

Run python train.py using the following arguments:

Argument Possible values
--exp_name Experiment name
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--m Number of memory vectors (default: 40)
--head Number of heads (default: 8)
--warmup Warmup value for learning rate scheduling (default: 10000)
--resume_last If used, the training will be resumed from the last checkpoint.
--resume_best If used, the training will be resumed from the best checkpoint.
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations
--logs_folder Path folder for tensorboard logs (default: "tensorboard_logs")

For example, to train our model with the parameters used in our experiments, use

python train.py --exp_name m2_transformer --batch_size 50 --m 40 --head 8 --warmup 10000 --features_path /path/to/features --annotation_folder /path/to/annotations

Sample Results

References

[1] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Owner
AImageLab
AImageLab
ICRA 2021 - Robust Place Recognition using an Imaging Lidar

Robust Place Recognition using an Imaging Lidar A place recognition package using high-resolution imaging lidar. For best performance, a lidar equippe

Tixiao Shan 293 Dec 27, 2022
PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"

PASS - Official PyTorch Implementation [CVPR2021 Oral] Prototype Augmentation and Self-Supervision for Incremental Learning Fei Zhu, Xu-Yao Zhang, Chu

67 Dec 27, 2022
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python=3.7 pytorch=1.6.0 torchvision=0.8

Yunfan Li 210 Dec 30, 2022
Learning Features with Parameter-Free Layers (ICLR 2022)

Learning Features with Parameter-Free Layers (ICLR 2022) Dongyoon Han, YoungJoon Yoo, Beomyoung Kim, Byeongho Heo | Paper NAVER AI Lab, NAVER CLOVA Up

NAVER AI 65 Dec 07, 2022
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

Stream-AD 61 Dec 02, 2022
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

449 Dec 27, 2022
Invariant Causal Prediction for Block MDPs

MISA Abstract Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challeng

Meta Research 41 Sep 17, 2022
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

What's New Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here. [Oct 2021]: V

Meta Research 2.9k Jan 07, 2023
Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available f

Yongrui Chen 5 Nov 10, 2022
Buffon’s needle: one of the oldest problems in geometric probability

Buffon-s-Needle Buffon’s needle is one of the oldest problems in geometric proba

3 Feb 18, 2022
Simulation of the solar system using various nummerical methods

solar-system Simulation of the solar system using various nummerical methods Download the repo Make shure matplotlib, scipy etc. are installed execute

Caspar 7 Jul 15, 2022
Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

VIN: Value Iteration Networks A quick thank you A few others have released amazing related work which helped inspire and improve my own implementation

Kent Sommer 297 Dec 26, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Super Resolution Examples We run this script under TensorFlow 2.0 and the TensorLayer2.0+. For TensorLayer 1.4 version, please check release. 🚀 🚀 🚀

TensorLayer Community 2.9k Jan 08, 2023
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021)

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021) This repository contains the official implemen

81 Dec 14, 2022
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms. It provides easily interchangeable modeling and planning components, and a set of utility function

Facebook Research 724 Jan 04, 2023
Generative Adversarial Text-to-Image Synthesis

###Generative Adversarial Text-to-Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee This is the

Scott Ellison Reed 883 Dec 31, 2022
QT Py Media Knob using rotary encoder & neopixel ring

QTPy-Knob QT Py USB Media Knob using rotary encoder & neopixel ring The QTPy-Knob features: Media knob for volume up/down/mute with "qtpy-knob.py" Cir

Tod E. Kurt 56 Dec 30, 2022