Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020

Related tags

Testingshow-edit-tell
Overview

Show, Edit and Tell: A Framework for Editing Image Captions | arXiv

This contains the source code for Show, Edit and Tell: A Framework for Editing Image Captions, to appear at CVPR 2020

Requirements

  • Python 3.6 or 3.7
  • PyTorch 1.2

For evaluation, you also need:

Argument Parser is currently not supported. We will add support to it soon.

Pretrained Models

You can download the pretrained models from here. Place them in eval folder.

Download and Prepare Features

In this work, we use 36 fixed bottom-up features. If you wish to use the adaptive features (10-100), please refer to adaptive_features folder in this repository and follow the instructions.

First, download the fixed features from here and unzip the file. Place the unzipped folder in bottom-up_features folder.

Next type this command:

python bottom-up_features/tsv.py

This command will create the following files:

  • An HDF5 file containing the bottom up image features for train and val splits, 36 per image for each split, in an (I, 36, 2048) tensor where I is the number of images in the split.
  • PKL files that contain training and validation image IDs mapping to index in HDF5 dataset created above.

Download/Prepare Caption Data

You can either download all the related caption data files from here or create them yourself. The folder contains the following:

  • WORDMAP_coco: maps the words to indices
  • CAPUTIL: stores the information about the existing captions in a dictionary organized as follows: {"COCO_image_name": {"caption": "existing caption to be edited", "encoded_previous_caption": an encoded list of the words, "previous_caption_length": a list contaning the length of the caption, "image_ids": the COCO image id}
  • CAPTIONS the encoded ground-truth captions (a list with number_images x 5 lists. Example: we have 113,287 training images in Karpathy Split, thereofre there is 566,435 lists for the training split)
  • CAPLENS: the length of the ground-truth captions (a list with number_images x 5 vallues)
  • NAMES: the COCO image name in the same order as the CAPTIONS
  • GENOME_DETS: the splits and image ids for loading the images in accordance to the features file created above

If you'd like to create the caption data yourself, download Karpathy's Split training, validation, and test splits. This zip file contains the captions. Place the file in caption data folder. You should also have the pkl files created from the 'Download Features' section: train36_imgid2idx.pkl and val36_imgid2idx.pkl.

Next, run:

python preprocess_caps.py

This will dump all the files to the folder caption data.

Next, download the existing captios to be edited, and organize them in a list containing dictionaries with each dictionary in the following format: {"image_id": COCO_image_id, "caption": "caption to be edited", "file_name": "split\\COCO_image_name"}. For example: {"image_id": 522418, "caption": "a woman cutting a cake with a knife", "file_name": "val2014\\COCO_val2014_000000522418.jpg"}. In our work, we use the captions produced by AoANet.

Next, run:

python preprocess_existing_caps.py

This will dump all the existing caption files to the folder caption data.

Prepare/Download Sequence-Level Training Data

Download the RL-data for sequence-level training used for computing metric scores from here.

Alternitavely, you may prepare the data yourself:

Run the following command:

python preprocess_rl.py

This will dump two files in the data folder used for computing metric scores.

Training and Validation

XE training stage:

For training DCNet, run:

python dcnet.py

For optimizing DCNet with MSE, run:

python dcnet_with_mse.py

For training editnet:

python editnet.py
Cider-D Optimization stage:

For training DCNet, run:

python dcnet_rl.py

For training editnet:

python editnet_rl.py

Evaluation

Refer to eval folder for instructions. All the generated captions and scores from our model can be found in the outputs folder.

BLEU-1 BLEU-4 CIDEr SPICE
Cross-Entropy Loss 77.9 38.0 1.200 21.2
CIDEr Optimization 80.6 39.2 1.289 22.6

Citation

@InProceedings{Sammani_2020_CVPR,
author = {Sammani, Fawaz and Melas-Kyriazi, Luke},
title = {Show, Edit and Tell: A Framework for Editing Image Captions},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

References

Our code is mainly based on self-critical and show attend and tell. We thank both authors.

Owner
Fawaz Sammani
The human brain is a miracle every human has, and mathematically modelling that brain is an overwhelming matter! I like teaching machines vision-language
Fawaz Sammani
A browser automation framework and ecosystem.

Selenium Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provide

Selenium 25.5k Jan 01, 2023
GitHub action for AppSweep Mobile Application Security Testing

GitHub action for AppSweep can be used to continuously integrate app scanning using AppSweep into your Android app build process

Guardsquare 14 Oct 06, 2022
Automated mouse clicker script using PyAutoGUI and Typer.

clickpy Automated mouse clicker script using PyAutoGUI and Typer. This app will randomly click your mouse between 1 second and 3 minutes, to prevent y

Joe Fitzgibbons 0 Dec 01, 2021
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Leon 3.5k Dec 30, 2022
A modern API testing tool for web applications built with Open API and GraphQL specifications.

Schemathesis Schemathesis is a modern API testing tool for web applications built with Open API and GraphQL specifications. It reads the application s

Schemathesis.io 1.6k Jan 06, 2023
Scraping Bot for the Covid19 vaccination website of the Canton of Zurich, Switzerland.

Hi 👋 , I'm David A passionate developer from France. 🌱 I’m currently learning Kotlin, ReactJS and Kubernetes 👨‍💻 All of my projects are available

1 Nov 14, 2021
XSSearch - A comprehensive reflected XSS tool built on selenium framework in python

XSSearch A Comprehensive Reflected XSS Scanner XSSearch is a comprehensive refle

Sathyaprakash Sahoo 49 Oct 18, 2022
This project demonstrates selenium's ability to extract files from a website.

This project demonstrates selenium's ability to extract files from a website. I've added the challenge of connecting over TOR. This package also includes a personal archive site built in NodeJS and A

2 Jan 16, 2022
Descriptor Vector Exchange

Descriptor Vector Exchange This repo provides code for learning dense landmarks without supervision. Our approach is described in the ICCV 2019 paper

James Thewlis 74 Nov 29, 2022
Voip Open Linear Testing Suite

VOLTS Voip Open Linear Tester Suite Functional tests for VoIP systems based on voip_patrol and docker 10'000 ft. view System is designed to run simple

Igor Olhovskiy 17 Dec 30, 2022
A library for generating fake data and populating database tables.

Knockoff Factory A library for generating mock data and creating database fixtures that can be used for unit testing. Table of content Installation Ch

Nike Inc. 30 Sep 23, 2022
自动化爬取并自动测试所有swagger-ui.html显示的接口

swagger-hack 在测试中偶尔会碰到swagger泄露 常见的泄露如图: 有的泄露接口特别多,每一个都手动去试根本试不过来 于是用python写了个脚本自动爬取所有接口,配置好传参发包访问 原理是首先抓取http://url/swagger-resources 获取到有哪些标准及对应的文档地

jayus 534 Dec 29, 2022
Aplikasi otomasi klik di situs popcat.click menggunakan Python dan Selenium

popthe-popcat Aplikasi Otomasi Klik di situs popcat.click. aplikasi ini akan secara otomatis melakukan click pada kucing viral itu, sehingga anda tida

cndrw_ 2 Oct 07, 2022
Automating the process of sorting files in my downloads folder by file type.

downloads-folder-automation Automating the process of sorting files in a user's downloads folder on Windows by file type. This script iterates through

Eric Mahasi 27 Jan 07, 2023
Faker is a Python package that generates fake data for you.

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in yo

Daniele Faraglia 15.2k Jan 01, 2023
automate the procedure of 403 response code bypass

403bypasser automate the procedure of 403 response code bypass Description i notice a lot of #bugbountytips describe how to bypass 403 response code s

smackerdodi2 40 Dec 16, 2022
WomboAI Art Generator

WomboAI Art Generator Automate AI art generation using wombot.art. Also integrated into SnailBot for you to try out. Setup Install Python Go to the py

nbee 7 Dec 03, 2022
Coverage plugin for pytest.

Overview docs tests package This plugin produces coverage reports. Compared to just using coverage run this plugin does some extras: Subprocess suppor

pytest-dev 1.4k Dec 29, 2022
Testinfra test your infrastructures

Testinfra test your infrastructure Latest documentation: https://testinfra.readthedocs.io/en/latest About With Testinfra you can write unit tests in P

pytest-dev 2.1k Jan 07, 2023
FauxFactory generates random data for your automated tests easily!

FauxFactory FauxFactory generates random data for your automated tests easily! There are times when you're writing tests for your application when you

Og Maciel 37 Sep 23, 2022