Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Last update: Dec 29, 2022

Related tags

Deep Learning asg2cap

Overview

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

This repository contains PyTorch implementation of our paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (CVPR 2020).

Prerequisites

Python 3 and PyTorch 1.3.

# clone the repository
git clone https://github.com/cshizhe/asg2cap.git
cd asg2cap
# clone caption evaluation codes
git clone https://github.com/cshizhe/eval_cap.git
export PYTHONPATH=$(pwd):${PYTHONPATH}

Training & Inference

cd controlimcap/driver

# support caption models: [node, node.role, 
# rgcn, rgcn.flow, rgcn.memory, rgcn.flow.memory]
# see our paper for details
mtype=rgcn.flow.memory 

# setup config files
# you should modify data paths in configs/prepare_*_imgsg_config.py
python configs/prepare_coco_imgsg_config.py $mtype
resdir='' # copy the output string of the previous step

# training
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_loss --is_train --num_workers 8

# inference
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_set tst --num_workers 8

Datasets

Annotations

Annotations for MSCOCO and VisualGenome datasets can be download from GoogleDrive.

(Image, ASG, Caption) annotations: regionfiles/image_id.json

JSON Format:
{
	"region_id": {
		"objects":[
			{
	     		"object_id": int, 
	     		"name": str, 
	     		"attributes": [str],
				"x": int,
				"y": int, 
				"w": int, 
				"h": int
			}],
  	  "relationships": [
			{
				"relationship_id": int,
				"subject_id": int,
				"object_id": int,
				"name": str
			}],
  	  "phrase": str,
  }
}

vocabularies int2word.npy: [word] word2int.json: {word: int}
data splits: public_split directory trn_names.npy, val_names.npy, tst_names.npy

Features

Features for MSCOCO and VisualGenome datasets are available at BaiduNetdisk (code: 6q32).

We also provide pretrained models and codes to extract features for new images.

Global Image Feature: the last mean pooling feature of ResNet101 pretrained on ImageNet

format: npy array, shape=(num_fts, dim_ft) corresponding to the order in data_split names

Region Image Feature: fc7 layer of Faster-RCNN pretrained on VisualGenome

format: hdf5 files, "image_id".jpg.hdf5

key: 'image_id'.jpg

attrs: {"image_w": int, "image_h": int, "boxes": 4d array (x1, y1, x2, y2)}

Result Visualization

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{chen2020say,
  title={Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs},
  author={Chen, Shizhe and Jin, Qin and Wang, Peng and Wu, Qi},
  journal={CVPR},
  year={2020}
}

License

MIT License

Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Related tags

Overview

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Prerequisites

Training & Inference

Datasets

Annotations

Features

Result Visualization

Citations

License

Owner

Shizhe Chen

Einshape: DSL-based reshaping library for JAX and other frameworks.

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

Anonymous implementation of KSL

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Accuracy Aligned. Concise Implementation of Swin Transformer

Code for Efficient Visual Pretraining with Contrastive Detection

PyTorch 1.0 inference in C++ on Windows10 platforms

A curated list of awesome Active Learning

GNN-based Recommendation Benchmark

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

Action Recognition for Self-Driving Cars

Project dự đoán giá cổ phiếu bằng thuật toán LSTM gồm: code train và code demo

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

NAVER BoostCamp Final Project

Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

An LSTM based GAN for Human motion synthesis