Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Overview

Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Requirements

  • Python 3.7
  • Pytorch 1.2

Prepare data

  1. Please use git clone --recurse-submodules to clone this repository and remember to follow initialization steps in coco-caption/README.md. Then download and place the Flickr30k reference file under coco-caption/annotations. Also, download Stanford CoreNLP 3.9.1 for grounding evaluation and place the uncompressed folder under the tools/ directory.
  2. Download the preprocessd dataset from this link and extract it to data/.
  3. For Flickr30k-Entities, please download bottom-up visual feature extracted by Anderson's extractor (Zhou's extractor) from this link ( link) and place the uncompressed folders under data/flickrbu/. For MSCOCO, please follow this instruction to prepare the bottom-up features and place them under data/mscoco/.
  4. Download the pretrained models from here and extract them to log/.
  5. Download the pretrained SCAN models from this link and extract them to misc/SCAN/runs.

Evaluation

To reproduce the results reported in the paper, just simply run

bash eval_flickr.sh

fro Flickr30k-Entities and

bash eval_coco.sh

for MSCOCO.

Training

  1. In the first training stage, run like
python train.py --id CE-scan-sup-0.1kl --caption_model topdown --input_json data/flickrtalk.json --input_fc_dir data/flickrbu/flickrbu_fc --input_att_dir data/flickrbu/flickrbu_att  --input_box_dir data/flickrbu/flickrbu_box  --input_label_h5 data/flickrtalk_label.h5 --batch_size 29 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log/CE-scan-sup-0.1kl --save_checkpoint_every 1000 --val_images_use -1 --max_epochs 30  --att_supervise  True   --att_supervise_weight 0.1
  1. In the second training stage, run like
python train.py --id sc-ground-CE-scan-sup-0.1kl --caption_model topdown --input_json data/flickrtalk.json --input_fc_dir data/flickrbu/flickrbu_fc --input_att_dir data/flickrbu/flickrbu_att  --input_box_dir data/flickrbu/flickrbu_box  --input_label_h5 data/flickrtalk_label.h5 --batch_size 29 --learning_rate 5e-5 --start_from log/CE-scan-sup-0.1kl --checkpoint_path log/sc-ground-CE-scan-sup-0.1kl --save_checkpoint_every 1000 --language_eval 1 --val_images_use -1 --self_critical_after 30  --max_epochs  110      --cider_reward_weight  1
--ground_reward_weight   1 

Citation

@inproceedings{zhou2020grounded,
  title={More Grounded Image Captioning by Distilling Image-Text Matching Model},
  author={Zhou, Yuanen and Wang, Meng and Liu, Daqing and  Hu, Zhenzhen and Zhang, Hanwang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Acknowledgements

This repository is built upon self-critical.pytorch, SCAN and grounded-video-description. Thanks for their released code.

Owner
YE Zhou
YE Zhou
This is a TensorFlow implementation for C2-Rec

This is a TensorFlow implementation for C2-Rec We refer to the repo SASRec. Requirements requirement.txt Datasets This repo includes Amazon Beauty dat

7 Nov 14, 2022
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Facebook Research 255 Dec 27, 2022
DeLighT: Very Deep and Light-Weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (I

Sachin Mehta 440 Dec 18, 2022
A repository for storing njxzc final exam review material

文档地址,请戳我 👈 👈 👈 ☀️ 1.Reason 大三上期末复习软件工程的时候,发现其他高校在GitHub上开源了他们学校的期末试题,我很受触动。期末

GuJiakai 2 Jan 18, 2022
A PyTorch Implementation of Single Shot MultiBox Detector

SSD: Single Shot MultiBox Object Detector, in PyTorch A PyTorch implementation of Single Shot MultiBox Detector from the 2016 paper by Wei Liu, Dragom

Max deGroot 4.8k Jan 07, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 05, 2023
Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation The reference code of Improving Factual Completeness and C

46 Dec 15, 2022
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

Manipulator Learning This repository contains a set of manipulation environments that are compatible with OpenAI Gym and simulated in pybullet. In par

STARS Laboratory 5 Dec 08, 2022
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Cross Transformers - Pytorch (wip) Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch Install $ pip install cross-t

Phil Wang 40 Dec 22, 2022
Implementation for Curriculum DeepSDF

Curriculum-DeepSDF This repository is an implementation for Curriculum DeepSDF. Full paper is available here. Preparation Please follow original setti

Haidong Zhu 69 Dec 29, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

[ICLR'21] DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [openreview] Authors: Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun

55 Nov 01, 2022
This is the source code of the 1st place solution for segmentation task (with Dice 90.32%) in 2021 CCF BDCI challenge.

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
exponential adaptive pooling for PyTorch

AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling Abstract Pooling layers are essential building blocks of Convolutional Ne

Alexandros Stergiou 55 Jan 04, 2023
Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

U-GAT-IT — Official PyTorch Implementation : Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Imag

Hyeonwoo Kang 2.4k Jan 04, 2023
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022
PPO Lagrangian in JAX

PPO Lagrangian in JAX This repository implements PPO in JAX. Implementation is tested on the safety-gym benchmark. Usage Install dependencies using th

Karush Suri 2 Sep 14, 2022
Lightwood is Legos for Machine Learning.

Lightwood is like Legos for Machine Learning. A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glu

MindsDB Inc 312 Jan 08, 2023
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
A PyTorch implementation of the continual learning experiments with deep neural networks

Brain-Inspired Replay A PyTorch implementation of the continual learning experiments with deep neural networks described in the following paper: Brain

182 Dec 27, 2022