This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Related tags

Deep Learningqb-norm
Overview

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

  1. Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
  2. Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
  3. Run QB-Norm
python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

QB-Norm Results on MSRVTT Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 14.4(0.1) 37.4(0.1) 50.2(0.1) 10.0(0.0) 30.0(0.1)
CE+ (+QB-Norm) Full t2v 16.4(0.0) 40.3(0.1) 52.9(0.1) 9.0(0.0) 32.7(0.1)
TT-CE+ Full t2v 14.9(0.1) 38.3(0.1) 51.5(0.1) 10.0(0.0) 30.9(0.1)
TT-CE+ (+QB-Norm) Full t2v 17.3(0.0) 42.1(0.2) 54.9(0.1) 8.0(0.0) 34.2(0.1)

QB-Norm Results on MSVD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 25.4(0.3) 56.9(0.4) 71.3(0.2) 4.0(0.0) 46.9(0.3)
TT-CE+ (+QB-Norm) Full t2v 26.6(1.0) 58.6(1.3) 71.8(1.1) 4.0(0.0) 48.2(1.2)
CLIP2Video Full t2v 47.0 76.8 85.9 2.0 67.7
CLIP2Video (+QB-Norm) Full t2v 48.0 77.9 86.2 2.0 68.5

QB-Norm Results on DiDeMo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 21.6(0.7) 48.6(0.4) 62.9(0.6) 6.0(0.0) 40.4(0.4)
TT-CE+ (+QB-Norm) Full t2v 24.2(0.7) 50.8(0.7) 64.4(0.1) 5.3(0.5) 43.0(0.2)
CLIP4Clip Full t2v 43.0 70.5 80.0 2.0 62.4
CLIP4Clip (+QB-Norm) Full t2v 43.5 71.4 80.9 2.0 63.1

QB-Norm Results on LSMDC Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 17.2(0.4) 36.5(0.6) 46.3(0.3) 13.7(0.5) 30.7(0.3)
TT-CE+ (+QB-Norm) Full t2v 17.8(0.4) 37.7(0.5) 47.6(0.6) 12.7(0.5) 31.7(0.3)
CLIP4Clip Full t2v 21.3 40.0 49.5 11.0 34.8
CLIP4Clip (+QB-Norm) Full t2v 22.4 40.1 49.5 11.0 35.4

QB-Norm Results on VaTeX Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 53.2(0.2) 87.4(0.1) 93.3(0.0) 1.0(0.0) 75.7(0.1)
TT-CE+ (+QB-Norm) Full t2v 54.8(0.1) 88.2(0.1) 93.8(0.1) 1.0(0.0) 76.8(0.0)
CLIP2Video Full t2v 57.4 87.9 93.6 1.0 77.9
CLIP2Video (+QB-Norm) Full t2v 58.8 88.3 93.8 1.0 78.7

QB-Norm Results on QuerYD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 13.2(2.0) 37.1(2.9) 50.5(1.9) 10.3(1.2) 29.1(2.2)
CE+ (+QB-Norm) Full t2v 14.1(1.8) 38.6(1.3) 51.1(1.6) 10.0(0.8) 30.2(1.7)
TT-CE+ Full t2v 14.4(0.5) 37.7(1.7) 50.9(1.6) 9.8(1.0) 30.3(0.9)
TT-CE+ (+QB-Norm) Full t2v 15.1(1.6) 38.3(2.4) 51.2(2.8) 10.3(1.7) 30.9(2.3)

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CLIP 5k t2i 30.3 56.1 67.1 4.0 48.5
CLIP (+QB-Norm) 5k t2i 34.8 59.9 70.4 3.0 52.8
MMT-Oscar 5k t2i 52.2 80.2 88.0 1.0 71.7
MMT-Oscar (+QB-Norm) 5k t2i 53.9 80.5 88.1 1.0 72.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
AR-CE Full t2a 23.1(0.6) 55.1(0.7) 70.7(0.6) 4.7(0.5) 44.8(0.7)
AR-CE (+QB-Norm) Full t2a 23.9(0.2) 57.1(0.3) 71.6(0.4) 4.0(0.0) 46.0(0.3)

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@misc{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      year={2021},
      eprint={2112.12777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
NAACL2021 - COIL Contextualized Lexical Retriever

COIL Repo for our NAACL paper, COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. The code covers learning

Luyu Gao 108 Dec 31, 2022
Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks This repository contains the code and data for the corresp

Friederike Metz 7 Apr 23, 2022
Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment"

BasicVSR_PlusPlus (CVPR 2022) [Paper] [Project Page] [Code] This is the official repository for BasicVSR++. Please feel free to raise issue related to

Kelvin C.K. Chan 227 Jan 01, 2023
buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

buildseg buildseg is a building extraction plugin of QGIS based on PaddlePaddle. TODO Extract building on 512x512 remote sensing images. Extract build

Yizhou Chen 11 Sep 26, 2022
Syed Waqas Zamir 906 Dec 30, 2022
RL Algorithms with examples in Python / Pytorch / Unity ML agents

Reinforcement Learning Project This project was created to make it easier to get started with Reinforcement Learning. It now contains: An implementati

Rogier Wachters 3 Aug 19, 2022
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

Tim Esler 3.3k Jan 04, 2023
Official Implementation for HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing Yuval Alaluf*, Omer Tov*, Ron Mokady, Rinon Gal, Amit H. Bermano *Denotes equ

885 Jan 06, 2023
DANet for Tabular data classification/ regression.

Deep Abstract Networks A pyTorch implementation for AAAI-2022 paper DANets: Deep Abstract Networks for Tabular Data Classification and Regression. Bri

Ronnie Rocket 55 Sep 14, 2022
Interactive web apps created using geemap and streamlit

geemap-apps Introduction This repo demostrates how to build a multi-page Earth Engine App using streamlit and geemap. You can deploy the app on variou

Qiusheng Wu 27 Dec 23, 2022
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

43 Oct 14, 2022
Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

Weiran Huang 13 Nov 30, 2022
hipCaffe: the HIP port of Caffe

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Cent

ROCm Software Platform 126 Dec 05, 2022
Full-featured Decision Trees and Random Forests learner.

CID3 This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query tr

Alejandro Penate-Diaz 3 Aug 15, 2022
Contrastive Learning for Metagenomic Binning

CLMB A simple framework for CLMB - a novel deep Contrastive Learningfor Metagenomic Binning Created by Pengfei Zhang, senior of Department of Computer

1 Sep 14, 2022
Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

TorchSeg This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch. Highlights Modular De

ycszen 1.4k Jan 02, 2023
Attention-guided gan for synthesizing IR images

SI-AGAN Attention-guided gan for synthesizing IR images This repository contains the Tensorflow code for "Pedestrian Gender Recognition by Style Trans

1 Oct 25, 2021
DLWP: Deep Learning Weather Prediction

DLWP: Deep Learning Weather Prediction DLWP is a Python project containing data-

Kushal Shingote 3 Aug 14, 2022
SegNet-like Autoencoders in TensorFlow

SegNet SegNet is a TensorFlow implementation of the segmentation network proposed by Kendall et al., with cool features like strided deconvolution, a

Andrea Azzini 66 Nov 05, 2021
🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

Gustavo Rosa 30 Jan 04, 2023