[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

Related tags

Deep LearningCONQUER
Overview

CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival

PyTorch implementation of CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival.

Task Definition

Given a natural language query, e.g., Addison is having a conversation with Bailey while checking on her baby, the problem of Video Corpus Moment Retrieval, is to locate a precise moment in a video retrieved from a large video corpus. And we are especially interested in the more pragmatic scenario, videos are additionally associated with the text descriptions such as subtitles or ASR (automatic speech transcript).

task_definition

Model Overiew

CONQUER:

  • Query-dependent Fusion (QDF)
  • Query-aware Feature Learning (QAL)
  • Moment localization (ML) head and optional video scoring (VS) head

model_overview

Getting started

Prerequisites

1 . Clone this repository

git clone https://github.com/houzhijian/CONQUER.git
cd CONQUER

2 . Prepare feature files and data

Download tvr_feature_release.tar.gz (21GB). After downloading the feature file, extract it to YOUR DATA STORAGE directory:

tar zxvf path/to/tvr_feature_release.tar.gz 

You should be able to see tvr_feature_release under YOUR DATA STORAGE directory.

It contains visual features (ResNet, SlowFast) obtained from HERO authors and text features (subtitle and query, from fine-tuned RoBERTa) obtained from XML authors. You can refer to the code to learn details on how the features are extracted: visual feature extraction, text feature extraction.

Then modify root_path inside config/tvr_data_config.json to your own root path for data storage.

3 . Install dependencies.

  • Python
  • PyTorch
  • Cuda
  • tensorboard
  • tqdm
  • lmdb
  • easydict
  • msgpack
  • msgpack_numpy

To install the dependencies use conda and pip, you need to have anaconda3 or miniconda3 installed first, then:

conda create --name conquer
conda activate conquer 
conda install python==3.7.9 numpy==1.19.2 pytorch==1.6.0 cudatoolkit=10.1 -c pytorch
conda install tensorboard==2.4.0 tqdm
pip install easydict lmdb msgpack msgpack_numpy

Training and Inference

NOTE: Currently only support train and inference using one gpu.

We give examples on how to perform training and inference for our CONQUER model.

1 . CONQUER training

bash scripts/TRAIN_SCRIPTS.sh EXP_ID CUDA_DEVICE_ID

TRAIN_SCRIPTS is a name string for training script. EXP_ID is a name string for current run. CUDA_DEVICE_ID is cuda device id.

Below are four examples of training CONQUER when

  • it adopts general similarity measure function without shared normalization training objective :
bash scripts/train_general.sh general 0 
  • it adopts general similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_general.sh general_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts disjoint similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_disjoint.sh disjoint_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts exclusive similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_exclusive_pretrain.sh exclusive_pretrain_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16 --encoder_pretrain_ckpt_filepath YOUR_DATA_STORAGE_PATH/first_stage_trained_model/model.ckpt

NOTE: The training has randomness when we adopt shared normalization training objective, because we randomly sample negative videos via an adpative pool size. You will witness performance difference each time.

2 . CONQUER inference

After training, you can inference using the saved model on val or test_public set:

bash scripts/inference.sh MODEL_DIR_NAME CUDA_DEVICE_ID

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., tvr-general_extend1000_neg3-*. CUDA_DEVICE_ID is cuda device id.

By default, this code evaluates all the 3 tasks (VCMR, SVMR, VR), you can change this behavior by appending option, e.g. --tasks VCMR VR where only VCMR and VR are evaluated.

Below is one example of inference CONQUER which produce the best performance shown in paper.

2.1. Download the trained model tvr-conquer_general_paper_performance.tar.gz (173 MB). After downloading the trained model, extract it to the current directory:

tar zxvf tvr-conquer_general_paper_performance.tar.gz

You should be able to see results/tvr-conquer_general_paper_performance under the current directory.

2.2. Perform inference on validation split

bash scripts/inference.sh tvr-conquer_general_paper_performance 0 --nms_thd 0.7

We use non-maximum suppression (NMS) and set the threshold as 0.7, because NMS can contribute to a higher [email protected] and [email protected] score empirically.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{hou2020conquer,
  title={CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval},
  author={Zhijian, Hou and  Chong-Wah, Ngo and Wing-Kwong Chan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgement

This code borrowed components from the following projects: TVRetrieval, HERO, HuggingFace, MMT, MME. We thank the authors for open-sourcing these great projects!

Contact

zjhou3-c [at] my.cityu.edu.hk

Owner
Hou zhijian
A PH.D student
Hou zhijian
Deep Latent Force Models

Deep Latent Force Models This repository contains a PyTorch implementation of the deep latent force model (DLFM), presented in the paper, Compositiona

Tom McDonald 5 Oct 26, 2022
Inkscape extensions for figure resizing and editing

Academic-Inkscape: Extensions for figure resizing and editing This repository contains several Inkscape extensions designed for editing plots. Scale P

192 Dec 26, 2022
Self-describing JSON-RPC services made easy

ReflectRPC Self-describing JSON-RPC services made easy Contents What is ReflectRPC? Installation Features Datatypes Custom Datatypes Returning Errors

Andreas Heck 31 Jul 16, 2022
EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients.

EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients. This repository is the official im

Yassir BENDOU 57 Dec 26, 2022
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
RCDNet: A Model-driven Deep Neural Network for Single Image Rain Removal (CVPR2020)

RCDNet: A Model-driven Deep Neural Network for Single Image Rain Removal (CVPR2020) Hong Wang, Qi Xie, Qian Zhao, and Deyu Meng [PDF] [Supplementary M

Hong Wang 6 Sep 27, 2022
A diff tool for language models

LMdiff Qualitative comparison of large language models. Demo & Paper: http://lmdiff.net LMdiff is a MIT-IBM Watson AI Lab collaboration between: Hendr

Hendrik Strobelt 27 Dec 29, 2022
Prior-Guided Multi-View 3D Head Reconstruction

Prior-Guided Head MVS This repository includes some reconstruction results of our IEEE TMM 2021 paper, Prior-Guided Multi-View 3D Head Reconstruction.

11 Aug 17, 2022
This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking

SimpleTrack This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking. We are still working on writing t

TuSimple 189 Dec 26, 2022
TreeSubstitutionCipher - Encryption system based on trees and substitution

Tree Substitution Cipher Generation Algorithm: Generate random tree. Tree nodes

stepa 1 Jan 08, 2022
Repository for the NeurIPS 2021 paper: "Exploiting Domain-Specific Features to Enhance Domain Generalization".

meta-Domain Specific-Domain Invariant (mDSDI) Source code implementation for the paper: Manh-Ha Bui, Toan Tran, Anh Tuan Tran, Dinh Phung. "Exploiting

VinAI Research 12 Nov 25, 2022
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

Princeton Natural Language Processing 119 Sep 28, 2022
Natural Intelligence is still a pretty good idea.

Human Learn Machine Learning models should play by the rules, literally. Project Goal Back in the old days, it was common to write rule-based systems.

vincent d warmerdam 641 Dec 26, 2022
Unified learning approach for egocentric hand gesture recognition and fingertip detection

Unified Gesture Recognition and Fingertip Detection A unified convolutional neural network (CNN) algorithm for both hand gesture recognition and finge

Mohammad 227 Dec 25, 2022
This application explain how we can easily integrate Deepface framework with Python Django application

deepface_suite This application explain how we can easily integrate Deepface framework with Python Django application install redis cache install requ

Mohamed Naji Aboo 3 Apr 18, 2022
A PoC Corporation Relationship Knowledge Graph System on top of Nebula Graph.

Corp-Rel is a PoC of Corpartion Relationship Knowledge Graph System. It's built on top of the Open Source Graph Database: Nebula Graph with a dataset

Wey Gu 20 Dec 11, 2022
The implementation of our CIKM 2021 paper titled as: "Cross-Market Product Recommendation"

FOREC: A Cross-Market Recommendation System This repository provides the implementation of our CIKM 2021 paper titled as "Cross-Market Product Recomme

Hamed Bonab 16 Sep 12, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 01, 2022
StarGAN2 for practice

StarGAN2 for practice This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scie

vadim epstein 87 Sep 24, 2022