[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

Related tags

Deep LearningCONQUER
Overview

CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival

PyTorch implementation of CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival.

Task Definition

Given a natural language query, e.g., Addison is having a conversation with Bailey while checking on her baby, the problem of Video Corpus Moment Retrieval, is to locate a precise moment in a video retrieved from a large video corpus. And we are especially interested in the more pragmatic scenario, videos are additionally associated with the text descriptions such as subtitles or ASR (automatic speech transcript).

task_definition

Model Overiew

CONQUER:

  • Query-dependent Fusion (QDF)
  • Query-aware Feature Learning (QAL)
  • Moment localization (ML) head and optional video scoring (VS) head

model_overview

Getting started

Prerequisites

1 . Clone this repository

git clone https://github.com/houzhijian/CONQUER.git
cd CONQUER

2 . Prepare feature files and data

Download tvr_feature_release.tar.gz (21GB). After downloading the feature file, extract it to YOUR DATA STORAGE directory:

tar zxvf path/to/tvr_feature_release.tar.gz 

You should be able to see tvr_feature_release under YOUR DATA STORAGE directory.

It contains visual features (ResNet, SlowFast) obtained from HERO authors and text features (subtitle and query, from fine-tuned RoBERTa) obtained from XML authors. You can refer to the code to learn details on how the features are extracted: visual feature extraction, text feature extraction.

Then modify root_path inside config/tvr_data_config.json to your own root path for data storage.

3 . Install dependencies.

  • Python
  • PyTorch
  • Cuda
  • tensorboard
  • tqdm
  • lmdb
  • easydict
  • msgpack
  • msgpack_numpy

To install the dependencies use conda and pip, you need to have anaconda3 or miniconda3 installed first, then:

conda create --name conquer
conda activate conquer 
conda install python==3.7.9 numpy==1.19.2 pytorch==1.6.0 cudatoolkit=10.1 -c pytorch
conda install tensorboard==2.4.0 tqdm
pip install easydict lmdb msgpack msgpack_numpy

Training and Inference

NOTE: Currently only support train and inference using one gpu.

We give examples on how to perform training and inference for our CONQUER model.

1 . CONQUER training

bash scripts/TRAIN_SCRIPTS.sh EXP_ID CUDA_DEVICE_ID

TRAIN_SCRIPTS is a name string for training script. EXP_ID is a name string for current run. CUDA_DEVICE_ID is cuda device id.

Below are four examples of training CONQUER when

  • it adopts general similarity measure function without shared normalization training objective :
bash scripts/train_general.sh general 0 
  • it adopts general similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_general.sh general_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts disjoint similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_disjoint.sh disjoint_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts exclusive similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_exclusive_pretrain.sh exclusive_pretrain_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16 --encoder_pretrain_ckpt_filepath YOUR_DATA_STORAGE_PATH/first_stage_trained_model/model.ckpt

NOTE: The training has randomness when we adopt shared normalization training objective, because we randomly sample negative videos via an adpative pool size. You will witness performance difference each time.

2 . CONQUER inference

After training, you can inference using the saved model on val or test_public set:

bash scripts/inference.sh MODEL_DIR_NAME CUDA_DEVICE_ID

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., tvr-general_extend1000_neg3-*. CUDA_DEVICE_ID is cuda device id.

By default, this code evaluates all the 3 tasks (VCMR, SVMR, VR), you can change this behavior by appending option, e.g. --tasks VCMR VR where only VCMR and VR are evaluated.

Below is one example of inference CONQUER which produce the best performance shown in paper.

2.1. Download the trained model tvr-conquer_general_paper_performance.tar.gz (173 MB). After downloading the trained model, extract it to the current directory:

tar zxvf tvr-conquer_general_paper_performance.tar.gz

You should be able to see results/tvr-conquer_general_paper_performance under the current directory.

2.2. Perform inference on validation split

bash scripts/inference.sh tvr-conquer_general_paper_performance 0 --nms_thd 0.7

We use non-maximum suppression (NMS) and set the threshold as 0.7, because NMS can contribute to a higher [email protected] and [email protected] score empirically.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{hou2020conquer,
  title={CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval},
  author={Zhijian, Hou and  Chong-Wah, Ngo and Wing-Kwong Chan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgement

This code borrowed components from the following projects: TVRetrieval, HERO, HuggingFace, MMT, MME. We thank the authors for open-sourcing these great projects!

Contact

zjhou3-c [at] my.cityu.edu.hk

Owner
Hou zhijian
A PH.D student
Hou zhijian
[ECE NTUA] 👁 Computer Vision - Lab Projects & Theoretical Problem Sets (2020-2021)

Computer Vision - NTUA (2020-2021) This repository hosts the lab projects and theoretical problem sets of the Computer Vision course held by ECE NTUA

Dimitris Dimos 6 Jul 21, 2022
classification task on dataset-CIFAR10,by using Tensorflow/keras

CIFAR10-Tensorflow classification task on dataset-CIFAR10,by using Tensorflow/keras 在这一个库中,我使用Tensorflow与keras框架搭建了几个卷积神经网络模型,针对CIFAR10数据集进行了训练与测试。分别使

3 Oct 17, 2021
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

EdiTTS: Score-based Editing for Controllable Text-to-Speech Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Au

Neosapience 98 Dec 25, 2022
A curated list of Generative Deep Art projects, tools, artworks, and models

Generative Deep Art A curated list of Generative Deep Art projects, tools, artworks, and models Inbox Get started with making AI art in 2022 – deeplea

Filipe Calegario 251 Jan 03, 2023
Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Mining the Social Web, 3rd Edition The official code repository for Mining the Social Web, 3rd Edition (O'Reilly, 2019). The book is available from Am

Mikhail Klassen 838 Jan 01, 2023
BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

Usman Mahmood 3 Aug 28, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Self-Supervised Document Similarity Ranking (SDR) via Contextualized Language Models and Hierarchical Inference This repo is the implementation for SD

Microsoft 36 Nov 28, 2022
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 04, 2023
MIMIC Code Repository: Code shared by the research community for the MIMIC-III database

MIMIC Code Repository The MIMIC Code Repository is intended to be a central hub for sharing, refining, and reusing code used for analysis of the MIMIC

MIT Laboratory for Computational Physiology 1.8k Dec 26, 2022
High accurate tool for automatic faces detection with landmarks

faces_detanator High accurate tool for automatic faces detection with landmarks. The library is based on public detectors with high accuracy (TinaFace

Ihar 7 May 10, 2022
Most popular metrics used to evaluate object detection algorithms.

Most popular metrics used to evaluate object detection algorithms.

Rafael Padilla 4.4k Dec 25, 2022
📖 Deep Attentional Guided Image Filtering

📖 Deep Attentional Guided Image Filtering [Paper] Zhiwei Zhong, Xianming Liu, Junjun Jiang, Debin Zhao ,Xiangyang Ji Harbin Institute of Technology,

9 Dec 23, 2022
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 01, 2022
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation Our paper is accepted by ICCV2021. Picture: Overview of the proposed Plug-an

Yunfei Liu 32 Dec 10, 2022
Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

SMOP is Small Matlab and Octave to Python compiler. SMOP translates matlab to py

Tom Xu 1 Jan 12, 2022
This repo is to present various code demos on how to use our Graph4NLP library.

Deep Learning on Graphs for Natural Language Processing Demo The repository contains code examples for DLG4NLP tutorials at NAACL 2021, SIGIR 2021, KD

Graph4AI 143 Dec 23, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022
[WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"

DRAGON: From Generalized zero-shot learning to long-tail with class descriptors Paper Project Website Video Overview DRAGON learns to correct the bias

Dvir Samuel 25 Dec 06, 2022