Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Last update: Nov 07, 2022

Related tags

Overview

NSGDC

Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

This is following UNITER. We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Image-Text Retrieval

Download Data

bash scripts/download_itm.sh $PATH_TO_STORAGE

Launch the Docker Container

# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

In case you would like to reproduce the whole preprocessing pipeline.

The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Image-Text Retrieval (Flickr30k)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_flickr.sh
bash run_cmds/tran_pnsgd2_base_flickr.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_flickr.sh
bash run_cmds/tran_pnsgd2_large_flickr.sh

Image-Text Retrieval (COCO)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_coco.sh
bash run_cmds/tran_pnsgd2_base_coco.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_coco.sh
bash run_cmds/tran_pnsgd2_large_coco.sh

Run Inference

bash run_cmds/inf_nsgd.sh

Results

Our models achieve the following performance.

MS-COCO

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	66.6	88.6	94.0	51.6	79.1	87.5
NSGDC-Large	67.8	89.6	94.2	53.3	80.0	88.0

Flickr30K

Model	Image-to-Text			Text-to-Image
Model	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
NSGDC-Base	87.9	98.1	99.3	74.5	93.3	96.3
NSGDC-Large	90.6	98.8	99.1	77.3	94.3	97.3

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Related tags

Overview

NSGDC

Requirements

Image-Text Retrieval

Download Data

Launch the Docker Container

Image-Text Retrieval (Flickr30k)

Image-Text Retrieval (COCO)

Run Inference

Results

MS-COCO

Flickr30K

Owner

Zhihao Fan

Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Fast and simple implementation of RL algorithms, designed to run fully on GPU.

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

Generate images from texts. In Russian. In PaddlePaddle

CowHerd is a partially-observed reinforcement learning environment

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

CONditionals for Ordinal Regression and classification in tensorflow

Implementations of paper Controlling Directions Orthogonal to a Classifier

People Interaction Graph

Pytorch Implementation for Dilated Continuous Random Field

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

PyTorch code for training MM-DistillNet for multimodal knowledge distillation

ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos

A configurable, tunable, and reproducible library for CTR prediction

Implementation for Shape from Polarization for Complex Scenes in the Wild

This is a demo app to be used in the video streaming applications

Pytorch implementation of the paper Time-series Generative Adversarial Networks

Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Using pretrained GROVER to extract the atomic fingerprints from molecule