Show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

Last update: Nov 29, 2022

Overview

Show, Attend and Tell

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.

References

Author's theano code: https://github.com/kelvinxu/arctic-captions

Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow

Getting Started

Prerequisites

First, clone this repo and pycocoevalcap in same directory.

$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git

This code is written in Python2.7 and requires TensorFlow 1.2. In addition, you need to install a few more packages to process MSCOCO data set. I have provided a script to download the MSCOCO image dataset and VGGNet19 model. Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in image/ directory and VGGNet19 model will be downloaded in data/ directory.

$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh

For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in image/train2014_resized/ and image/val2014_resized/ directory.

$ python resize.py

Before training the model, you have to preprocess the MSCOCO caption dataset. To generate caption dataset and image feature vectors, run command below.

$ python prepro.py

Train the model

To train the image captioning model, run command below.

$ python train.py

(optional) Tensorboard visualization

I have provided a tensorboard visualization for real-time debugging. Open the new terminal, run command below and open http://localhost:6005/ into your web browser.

$ tensorboard --logdir='./log' --port=6005

Evaluate the model

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb.

Show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

Related tags

Overview

Show, Attend and Tell

References

Getting Started

Prerequisites

Train the model

(optional) Tensorboard visualization

Evaluate the model

Results

Training data

(1) Generated caption: A plane flying in the sky with a landing gear down.

(2) Generated caption: A giraffe and two zebra standing in the field.

Validation data

(1) Generated caption: A large elephant standing in a dry grass field.

(2) Generated caption: A baby elephant standing on top of a dirt field.

Test data

(1) Generated caption: A plane flying over a body of water.

(2) Generated caption: A zebra standing in the grass near a tree.

Owner

Yunjey Choi

A Moonraker plug-in for real-time compensation of frame thermal expansion

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021)

Repository for paper "Non-intrusive speech intelligibility prediction from discrete latent representations"

Hashformers is a framework for hashtag segmentation with transformers.

ReAct: Out-of-distribution Detection With Rectified Activations

Fake-user-agent-traffic-geneator - Python CLI Tool to generate fake traffic against URLs with configurable user-agents

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

This is an official implementation for "Video Swin Transformers".

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

Public Models considered for emotion estimation from EEG

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

Oscar and VinVL

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"