Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Related tags

Deep Learningpcme
Overview

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Official Pytorch implementation of PCME | Paper

Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de Rezende2 Yannis Kalantidis2 Diane Larlus2

1NAVER AI LAB
2NAVER LABS Europe

VIDEO

Updates

  • 23 Jun, 2021: Initial upload.

Installation

Install dependencies using the following command.

pip install cython && pip install -r requirements.txt
python -c 'import nltk; nltk.download("punkt", download_dir="/opt/conda/nltk_data")'
git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dockerfile

You can use my docker image as well

docker pull sanghyukchun/pcme:torch1.2-apex-dali

Please Add --model__cache_dir /vector_cache when you run the code

Configuration

All experiments are based on configuration files (see config/coco and config/cub). If you want to change only a few options, instead of re-writing a new configuration file, you can override the configuration as the follows:

python .py --dataloader__batch_size 32 --dataloader__eval_batch_size 8 --model__eval_method matching_prob

See config/parser.py for details

Dataset preparation

COCO Caption

We followed the same split provided by VSE++. Dataset splits can be found in datasets/annotations.

Note that we also need instances_2014.json for computing PMRP score.

CUB Caption

Download images from this link, and download caption from reedscot/cvpr2016. You can use the image path and the caption path separately in the code.

Evaluate pretrained models

NOTE: the current implementation of plausible match R-Precision (PMRP) is not efficient:
It first dumps all ranked items for each item to a local file, and compute R-precision.
We are planning to re-implement efficient PMRP as soon as possible.

COCO Caption

# Compute recall metrics
python evaluate_recall_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image
# Compute plausible match R-Precision (PMRP) metric
python extract_rankings_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    --dump_to  \
    # --model__cache_dir /vector_cache # if you use my docker image

python evaluate_pmrp_coco.py --ranking_file 
Method I2T PMRP I2T [email protected] T2I PMRP T2I [email protected] Model file
PCME 45.0 68.8 46.0 54.6 link
PVSE K=1 40.3 66.7 41.8 53.5 -
PVSE K=2 42.8 69.2 43.6 55.2 -
VSRN 41.2 76.2 42.4 62.8 -
VSRN + AOQ 44.7 77.5 45.6 63.5 -

CUB Caption

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image

NOTE: If you just download file from reedscot/cvpr2016, then caption_root will be cvpr2016_cub/text_c10

If you want to test other probabilistic distances, such as Wasserstein distance or KL-divergence, try the following command:

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    --model__eval_method  \
    # --model__cache_dir /vector_cache # if you use my docker image

You can choose distance_method in ['elk', 'l2', 'min', 'max', 'wasserstein', 'kl', 'reverse_kl', 'js', 'bhattacharyya', 'matmul', 'matching_prob']

How to train

NOTE: we train each model with mixed-precision training (O2) on a single V100.
Since, the current code does not support multi-gpu training, if you use different hardware, the batchsize should be reduced.
Please note that, hence, the results couldn't be reproduced if you use smaller hardware than V100.

COCO Caption

python train_coco.py ./config/coco/pcme_coco.yaml --dataset_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 46 hours in a single V100 with mixed precision training.

CUB Caption

We use CUB Caption dataset (Reed, et al. 2016) as a new cross-modal retrieval benchmark. Here, instead of matching the sparse paired image-caption pairs, we treat all image-caption pairs in the same class as positive. Since our split is based on the zero-shot learning benchmark (Xian, et al. 2017), we leave out 50 classes from 200 bird classes for the evaluation.

  • Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Xian, Yongqin, Bernt Schiele, and Zeynep Akata. "Zero-shot learning-the good, the bad and the ugly." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

hyperparameter search

We additionally use cross-validation splits by (Xian, et el. 2017), namely using 100 classes for training and 50 classes for validation.

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --dataset_name cub_trainval1 \
    # --model__cache_dir /vector_cache # if you use my docker image

Similarly, you can use cub_trainval2 and cub_trainval3 as well.

training with full training classes

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 4 hours in a single V100 with mixed precision training.

How to cite

@inproceedings{chun2021pcme,
    title={Probabilistic Embeddings for Cross-Modal Retrieval},
    author={Chun, Sanghyuk and Oh, Seong Joon and De Rezende, Rafael Sampaio and Kalantidis, Yannis and Larlus, Diane},
    year={2021},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}

License

MIT License

Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Owner
NAVER AI
Official account of NAVER AI, Korea No.1 Industrial AI Research Group
NAVER AI
For IBM Quantum Challenge 2021 (May 20 - 26)

IBM Quantum Challenge 2021 Introduction Commemorating the 40-year anniversary of the Physics of Computation conference, and 5-year anniversary of IBM

Qiskit Community 140 Jan 01, 2023
Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

FSRE-Depth This is a Python3 / PyTorch implementation of FSRE-Depth, as described in the following paper: Fine-grained Semantics-aware Representation

77 Dec 28, 2022
A Flow-based Generative Network for Speech Synthesis

WaveGlow: a Flow-based Generative Network for Speech Synthesis Ryan Prenger, Rafael Valle, and Bryan Catanzaro In our recent paper, we propose WaveGlo

NVIDIA Corporation 2k Dec 26, 2022
PyTorch code for training MM-DistillNet for multimodal knowledge distillation

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge MM-DistillNet is a

51 Dec 20, 2022
Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties 8.11.2021 Andrij Vasylenko I

Leverhulme Research Centre for Functional Materials Design 4 Dec 20, 2022
Constructing Neural Network-Based Models for Simulating Dynamical Systems

Constructing Neural Network-Based Models for Simulating Dynamical Systems Note this repo is work in progress prior to reviewing This is a companion re

Christian Møldrup Legaard 21 Nov 25, 2022
Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Learning Generative Models of Textured 3D Meshes from Real-World Images This is the reference implementation of "Learning Generative Models of Texture

Dario Pavllo 115 Jan 07, 2023
pixelNeRF: Neural Radiance Fields from One or Few Images

pixelNeRF: Neural Radiance Fields from One or Few Images Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa UC Berkeley arXiv: http://arxiv.org/abs/2

Alex Yu 1k Jan 04, 2023
A Broader Picture of Random-walk Based Graph Embedding

Random-walk Embedding Framework This repository is a reference implementation of the random-walk embedding framework as described in the paper: A Broa

Zexi Huang 23 Dec 13, 2022
Transformer in Computer Vision

Transformer-in-Vision A paper list of some recent Transformer-based CV works. If you find some ignored papers, please open issues or pull requests. **

506 Dec 26, 2022
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
PyTorch implementation of TSception V2 using DEAP dataset

TSception This is the PyTorch implementation of TSception V2 using DEAP dataset in our paper: Yi Ding, Neethu Robinson, Su Zhang, Qiuhao Zeng, Cuntai

Yi Ding 27 Dec 15, 2022
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 221 Dec 30, 2022
discovering subdomains, hidden paths, extracting unique links

python-website-crawler discovering subdomains, hidden paths, extracting unique links pip install -r requirements.txt discover subdomain: You can give

merve 4 Sep 05, 2022
The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

Shiyi Lan 1 Oct 23, 2021
R-Drop: Regularized Dropout for Neural Networks

R-Drop: Regularized Dropout for Neural Networks R-drop is a simple yet very effective regularization method built upon dropout, by minimizing the bidi

756 Dec 27, 2022
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

Aleksey Korshuk 115 Dec 16, 2022
Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

Nataniel Ruiz 1.3k Dec 26, 2022
Generative Adversarial Text to Image Synthesis

Text To Image Synthesis This is a tensorflow implementation of synthesizing images. The images are synthesized using the GAN-CLS Algorithm from the pa

Hao 575 Jan 08, 2023
Mixed Neural Likelihood Estimation for models of decision-making

Mixed neural likelihood estimation for models of decision-making Mixed neural likelihood estimation (MNLE) enables Bayesian parameter inference for mo

mackelab 9 Dec 22, 2022