VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Last update: Dec 28, 2022

Overview

VisualGPT

Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Main Architecture of Our VisualGPT

Download the GPT-2 pretrained weights

curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin

Enviroment setup

Clone the repository and create the visualgpt conda environmnet

conda env create -f environment.yml
conda activate visualgpt

Then download spacy data

python -m spacy download en

Data preparation

We provide the COCO dataset for downloading. Please download the annotations file annotations.zip and extract it. and coco_detections.hdf5, in which the data is stored in a where key is the image id and value is a tensor (N, 2048). N it the number of detections

code structure

create the log folder mkdir logs and start the training

Train the model

python train_visualGPT.py --batch_size 50 --head 12 --features_path coco_detections.hdf5 --annotation_folder annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw  --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data

Acknowledgement

This code used resources from Meshed Memory Transformer and Transformers

Please cite our paper from the following bibtex

@article{chen2021visualgpt,
  title={VisualGPT: Data-efficient Image Captioning by Balancing Visual Input and Linguistic Knowledge from Pretraining},
  author={Chen, Jun and Guo, Han and Yi, Kai and Li, Boyang and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2102.10407},
  year={2021}
}

@article{chen2021visualgpt,
  title={VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning},
  author={Chen, Jun and Guo, Han and Yi, Kai and Li, Boyang and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2102.10407},
  year={2021}
}

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Related tags

Overview

VisualGPT

Main Architecture of Our VisualGPT

Download the GPT-2 pretrained weights

Enviroment setup

Data preparation

code structure

Train the model

Acknowledgement

Owner

Vision CAIR Research Group, KAUST

This is the repository for The Machine Learning Workshops, published by AI DOJO

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

A deep neural networks for images using CNN algorithm.

A collection of papers about Transformer in the field of medical image analysis.

Code and data for paper "Deep Photo Style Transfer"

Deep Halftoning with Reversible Binary Pattern

The codebase for Data-driven general-purpose voice activity detection.

From a body shape, infer the anatomic skeleton.

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training @ KDD 2020

Zero-Cost Proxies for Lightweight NAS

Old Photo Restoration (Official PyTorch Implementation)

Image processing in Python

🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021