GoodNews Everyone! Context driven entity aware captioning for news images

Related tags

Deep LearningGoodNews
Overview

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy!

Model preview:

GoodNews Model!

Huge Thanks goes to New York Times API for providing such a service for FREE!

Another Thanks to @ruotianluo for providing the captioning code.

Dependencies/Requirements:

pytorch==1.0.0
spacy==2.0.11
h5py==2.7.0
bs4==4.5.3
joblib==0.12.2
nltk==3.2.3
tqdm==4.19.5
urllib2==2.7
goose==1.0.25
urlparse
unidecode

Introduction

We took the first steps to move the captioning systems to interpretation (see the paper for more detail). To this end, we have used New York Times API to retrieve the articles, images and captions.

The structure of this repo is as follows:

  1. Getting the data
  2. Cleaning and formating the data
  3. How to train models

Get the data

You have 3 options to get the data.

Images only

If you want to download the images only and directly start working on the same dataset as ours, then download the cleaned version of the dataset without images: article+caption.json and put it to data/ folder and download the img_urls.json and put it in the get_data/get_images_only/ folder.

Then run

python get_images.py --num_thread 16

Then, you will get the images. After that move to Clean and Format Data section.

PS: I have recieved numerous emails regarding some of the images not present/broken in the img_urls.json. Which is why I decided to put the images on the drive to download in the name of open science. Download all images

Images + articles

If you would like the get the raw version of the article and captions to do your own cleaning and processing, no worries! First download the article_urls and go to folder get_data/with_article_urls/ and run

python get_data_with_urls.py --num_thread 16
python combine_dataset.py 

This will get you the raw version of the caption, articles and also the images. After that move to Clean and Format Data section.

I want more!

As you know, New York Times is huge. Their articles starts from 1881 (It is crazy!) until well today. So in case you want to get ALL the data or expand the data to more years, then first step is go to New York Times API and get an API key. All you have to do is just sign up for the API key.

Once you have the key go to folder get_data/with_api/ and run

python retrieve_all_urls.py --api-key XXXX --start_year XXX --end_year XXX 

This is for getting the article urls and then saving in the format of month-year. Once you have the all urls from the API, then you run

python get_data_api.py
python combine_dataset.py

get_data_api.py retrieves the articles, captions and images. combine_dataset.py combines yearly data into one file after removing data points if they have corrupt image, empty articles or empty captions. After that move to Clean and Format Data section.

Small Note

I also provide the links to images and their data splits (train, val, test). Even though I always use random seed to decide the split, just in case If the GODS meddles with the random seed, here is the link to a json where you can find each image and its split: img_splits.json

Clean and Format the Data

Now that we have the data, it is time to clean, preprocess and format the data.

Preprocess

When you reach this part, you must have captioning_dataset.json in your data/ folder.

Captions

This part is for cleaning the captions (tokenizing, removing non-ascii characters, etc.), splitting train, val, and test and creating anonymize captions.

In other words, we change the caption "Alber Einstein taught in Princeton in 1926" to "PERSON_ taught in ORGANIZATION_ in DATE_." Move to preprocess/ folder and run

python clean_captions.py

Resize Images

To resize the images to 256x256:

python resize.py --root XXXX --img_size 256

Articles

Get the article format that is needed for the encoding methods by running: create_article_set.py

python create_article_set.py

Format

Now to create H5 file for captions, images and articles, just need to go to scripts/ folder and run in order

python prepro_labels.py --max_length 31 --word_count_threshold 4
python prepro_images.py

We proposed 3 different article encoding method. You can download each of encoded article methods, articles_full_avg_, articles_full_wavg, articles_full_TBB.

Or you can use the code to obtain them:

python prepro_articles_avg.py
python prepro_articles_wavg.py
python prepro_articles_tbb.py

Train

Finally we are ready to train. Magical words are:

python train.py --cnn_weight [YOUR HOME DIRECTORY]/.torch/resnet152-b121ed2d.pth 

You can check the opt.py for changing a lot of the options such dimension size, different models, hyperparameters, etc.

Evaluate

After you train your models, you can get the score according commonly used metrics: Bleu, Cider, Spice, Rouge, Meteor. Be sure to specify model_path, cnn_model_path, infos_path and sen_embed_path when runing eval.py. eval.py is usually used in training but it is necessary to run it to get the insertion.

Insertion

Last but not least insert.py. After you run eval.py, it will produce you a json file with the ids and their template captions. To fill the correct named entity, you have to run insert.py:

python insert.py --output [XXX] --dump [True/False] --insertion_method ['ctx', 'att', 'rand']

PS: I have been requested to provide model's output, so I thought it would be best to share it with everyone. Model Output In this folder, you have:

test.json: Test set with raw and template version of the caption.

article.json: Article sentences which is needed in the insert.py.

w/o article folder: All the models output on template captions, without articles.

with article folder: Our models output in the paper with sentence attention(sen_att) and image attention(vis_att), provided in the json. Hope this is helpful to more of you.

Conclusion

Thank you and sorry for the bugs!

This code provides various models combining dilated convolutions with residual networks

Overview This code provides various models combining dilated convolutions with residual networks. Our models can achieve better performance with less

Fisher Yu 1.1k Dec 30, 2022
DSL for matching Python ASTs

py-ast-rule-engine This library provides a DSL (domain-specific language) to match a pattern inside a Python AST (abstract syntax tree). The library i

1 Dec 18, 2021
Official implementation for the paper: Permutation Invariant Graph Generation via Score-Based Generative Modeling

Permutation Invariant Graph Generation via Score-Based Generative Modeling This repo contains the official implementation for the paper Permutation In

64 Dec 29, 2022
Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection This repository is an official implementation of the AAAI 2021 paper Co-mi

MEGVII Research 20 Dec 07, 2022
Official pytorch implementation of DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces

DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces Minhyuk Sung*, Zhenyu Jiang*, Panos Achlioptas, Niloy J. Mitra, Leonidas

Zhenyu Jiang 21 Aug 30, 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 803 Dec 28, 2022
Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX. The repository combines a class agnostic object localizer to first detect the objects in the image

Ibai Gorordo 24 Nov 14, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.

UAV-Networks Simulator - Autonomous Networking - A.A. 20/21 UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac pr

0 Nov 13, 2021
Applying PVT to Semantic Segmentation

Applying PVT to Semantic Segmentation Here, we take MMSegmentation v0.13.0 as an example, applying PVTv2 to SemanticFPN. For details see Pyramid Visio

35 Nov 30, 2022
This is a computer vision based implementation of the popular childhood game 'Hand Cricket/Odd or Even' in python

Hand Cricket Table of Content Overview Installation Game rules Project Details Future scope Overview This is a computer vision based implementation of

Abhinav R Nayak 6 Jan 12, 2022
we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks.

Feature Aggregation and Refinement Network for 2D Anatomical Landmark Detection Overview Localization of anatomical landmarks is essential for clinica

aoyueyuan 0 Aug 28, 2022
Codecov coverage standard for Python

Python-Standard Last Updated: 01/07/22 00:09:25 What is this? This is a Python application, with basic unit tests, for which coverage is uploaded to C

Codecov 10 Nov 04, 2022
PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

deep-linear-shapes PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper. If you find this code useful i

Romain Loiseau 27 Sep 24, 2022
Optimal space decomposition based-product quantization for approximate nearest neighbor search

Optimal space decomposition based-product quantization for approximate nearest neighbor search Abstract Product quantization(PQ) is an effective neare

Mylove 1 Nov 19, 2021
pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

Hzzone 1.4k Jan 07, 2023
Bagua is a flexible and performant distributed training algorithm development framework.

Bagua is a flexible and performant distributed training algorithm development framework.

786 Dec 17, 2022
This repository implements WGAN_GP.

Image_WGAN_GP This repository implements WGAN_GP. Image_WGAN_GP This repository uses wgan to generate mnist and fashionmnist pictures. Firstly, you ca

Lieon 6 Dec 10, 2021
Personal project about genus-0 meshes, spherical harmonics and a cow

How to transform a cow into spherical harmonics ? Spot the cow, from Keenan Crane's blog Context In the field of Deep Learning, training on images or

3 Aug 22, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022