Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Related tags

Deep Learningpcme
Overview

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Official Pytorch implementation of PCME | Paper

Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de Rezende2 Yannis Kalantidis2 Diane Larlus2

1NAVER AI LAB
2NAVER LABS Europe

VIDEO

Updates

  • 23 Jun, 2021: Initial upload.

Installation

Install dependencies using the following command.

pip install cython && pip install -r requirements.txt
python -c 'import nltk; nltk.download("punkt", download_dir="/opt/conda/nltk_data")'
git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dockerfile

You can use my docker image as well

docker pull sanghyukchun/pcme:torch1.2-apex-dali

Please Add --model__cache_dir /vector_cache when you run the code

Configuration

All experiments are based on configuration files (see config/coco and config/cub). If you want to change only a few options, instead of re-writing a new configuration file, you can override the configuration as the follows:

python .py --dataloader__batch_size 32 --dataloader__eval_batch_size 8 --model__eval_method matching_prob

See config/parser.py for details

Dataset preparation

COCO Caption

We followed the same split provided by VSE++. Dataset splits can be found in datasets/annotations.

Note that we also need instances_2014.json for computing PMRP score.

CUB Caption

Download images from this link, and download caption from reedscot/cvpr2016. You can use the image path and the caption path separately in the code.

Evaluate pretrained models

NOTE: the current implementation of plausible match R-Precision (PMRP) is not efficient:
It first dumps all ranked items for each item to a local file, and compute R-precision.
We are planning to re-implement efficient PMRP as soon as possible.

COCO Caption

# Compute recall metrics
python evaluate_recall_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image
# Compute plausible match R-Precision (PMRP) metric
python extract_rankings_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    --dump_to  \
    # --model__cache_dir /vector_cache # if you use my docker image

python evaluate_pmrp_coco.py --ranking_file 
Method I2T PMRP I2T [email protected] T2I PMRP T2I [email protected] Model file
PCME 45.0 68.8 46.0 54.6 link
PVSE K=1 40.3 66.7 41.8 53.5 -
PVSE K=2 42.8 69.2 43.6 55.2 -
VSRN 41.2 76.2 42.4 62.8 -
VSRN + AOQ 44.7 77.5 45.6 63.5 -

CUB Caption

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image

NOTE: If you just download file from reedscot/cvpr2016, then caption_root will be cvpr2016_cub/text_c10

If you want to test other probabilistic distances, such as Wasserstein distance or KL-divergence, try the following command:

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    --model__eval_method  \
    # --model__cache_dir /vector_cache # if you use my docker image

You can choose distance_method in ['elk', 'l2', 'min', 'max', 'wasserstein', 'kl', 'reverse_kl', 'js', 'bhattacharyya', 'matmul', 'matching_prob']

How to train

NOTE: we train each model with mixed-precision training (O2) on a single V100.
Since, the current code does not support multi-gpu training, if you use different hardware, the batchsize should be reduced.
Please note that, hence, the results couldn't be reproduced if you use smaller hardware than V100.

COCO Caption

python train_coco.py ./config/coco/pcme_coco.yaml --dataset_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 46 hours in a single V100 with mixed precision training.

CUB Caption

We use CUB Caption dataset (Reed, et al. 2016) as a new cross-modal retrieval benchmark. Here, instead of matching the sparse paired image-caption pairs, we treat all image-caption pairs in the same class as positive. Since our split is based on the zero-shot learning benchmark (Xian, et al. 2017), we leave out 50 classes from 200 bird classes for the evaluation.

  • Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Xian, Yongqin, Bernt Schiele, and Zeynep Akata. "Zero-shot learning-the good, the bad and the ugly." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

hyperparameter search

We additionally use cross-validation splits by (Xian, et el. 2017), namely using 100 classes for training and 50 classes for validation.

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --dataset_name cub_trainval1 \
    # --model__cache_dir /vector_cache # if you use my docker image

Similarly, you can use cub_trainval2 and cub_trainval3 as well.

training with full training classes

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 4 hours in a single V100 with mixed precision training.

How to cite

@inproceedings{chun2021pcme,
    title={Probabilistic Embeddings for Cross-Modal Retrieval},
    author={Chun, Sanghyuk and Oh, Seong Joon and De Rezende, Rafael Sampaio and Kalantidis, Yannis and Larlus, Diane},
    year={2021},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}

License

MIT License

Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Owner
NAVER AI
Official account of NAVER AI, Korea No.1 Industrial AI Research Group
NAVER AI
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 02, 2023
Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Statistical Rethinking with Python and PyMC3 This repository has been deprecated in favour of this one, please check that repository for updates, for

Osvaldo Martin 786 Dec 29, 2022
Py4fi2nd - Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.

Python for Finance (2nd ed., O'Reilly) This repository provides all Python codes and Jupyter Notebooks of the book Python for Finance -- Mastering Dat

Yves Hilpisch 1k Jan 05, 2023
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features

CleanRL (Clean Implementation of RL Algorithms) CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation

Costa Huang 1.8k Jan 01, 2023
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search This is an implementation for our paper Contextual Non-Loca

Tencent YouTu Research 50 Dec 03, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
Pytorch for Segmentation

Pytorch for Semantic Segmentation This repo has been deprecated currently and I will not maintain it. Meanwhile, I strongly recommend you can refer to

ycszen 411 Nov 22, 2022
Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

shindo.py Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data stored in NumPy array Introduction Japa

RR_Inyo 3 Sep 23, 2022
GyroSPD: Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices

GyroSPD Code for the paper "Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices" accepted at NeurIPS 2021. Re

Federico Lopez 12 Dec 12, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo

Lacmus Foundation 168 Dec 27, 2022
NBEATSx: Neural basis expansion analysis with exogenous variables

NBEATSx: Neural basis expansion analysis with exogenous variables We extend the NBEATS model to incorporate exogenous factors. The resulting method, c

Cristian Challu 100 Dec 31, 2022
A curated list of long-tailed recognition resources.

Awesome Long-tailed Recognition A curated list of long-tailed recognition and related resources. Please feel free to pull requests or open an issue to

Zhiwei ZHANG 542 Jan 01, 2023
This repo contains source code and materials for the TEmporally COherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Nils Thuerey 5.2k Jan 02, 2023
MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

2 Jan 29, 2022
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 72 Dec 13, 2022
An updated version of virtual model making

Model-Swap-Face v2   这个项目是基于stylegan2 pSp制作的,比v1版本Model-Swap-Face在推理速度和图像质量上有一定提升。主要的功能是将虚拟模特进行环球不同区域的风格转换,目前转换器提供西欧模特、东亚模特和北非模特三种主流的风格样式,可帮我们实现生产资料零成

seeprettyface.com 62 Dec 09, 2022
PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

PyDeepFakeDet An integrated and scalable library for Deepfake detection research. Introduction PyDeepFakeDet is an integrated and scalable Deepfake de

Junke, Wang 49 Dec 11, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 04, 2022