[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

Related tags

Deep LearningCONQUER
Overview

CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival

PyTorch implementation of CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival.

Task Definition

Given a natural language query, e.g., Addison is having a conversation with Bailey while checking on her baby, the problem of Video Corpus Moment Retrieval, is to locate a precise moment in a video retrieved from a large video corpus. And we are especially interested in the more pragmatic scenario, videos are additionally associated with the text descriptions such as subtitles or ASR (automatic speech transcript).

task_definition

Model Overiew

CONQUER:

  • Query-dependent Fusion (QDF)
  • Query-aware Feature Learning (QAL)
  • Moment localization (ML) head and optional video scoring (VS) head

model_overview

Getting started

Prerequisites

1 . Clone this repository

git clone https://github.com/houzhijian/CONQUER.git
cd CONQUER

2 . Prepare feature files and data

Download tvr_feature_release.tar.gz (21GB). After downloading the feature file, extract it to YOUR DATA STORAGE directory:

tar zxvf path/to/tvr_feature_release.tar.gz 

You should be able to see tvr_feature_release under YOUR DATA STORAGE directory.

It contains visual features (ResNet, SlowFast) obtained from HERO authors and text features (subtitle and query, from fine-tuned RoBERTa) obtained from XML authors. You can refer to the code to learn details on how the features are extracted: visual feature extraction, text feature extraction.

Then modify root_path inside config/tvr_data_config.json to your own root path for data storage.

3 . Install dependencies.

  • Python
  • PyTorch
  • Cuda
  • tensorboard
  • tqdm
  • lmdb
  • easydict
  • msgpack
  • msgpack_numpy

To install the dependencies use conda and pip, you need to have anaconda3 or miniconda3 installed first, then:

conda create --name conquer
conda activate conquer 
conda install python==3.7.9 numpy==1.19.2 pytorch==1.6.0 cudatoolkit=10.1 -c pytorch
conda install tensorboard==2.4.0 tqdm
pip install easydict lmdb msgpack msgpack_numpy

Training and Inference

NOTE: Currently only support train and inference using one gpu.

We give examples on how to perform training and inference for our CONQUER model.

1 . CONQUER training

bash scripts/TRAIN_SCRIPTS.sh EXP_ID CUDA_DEVICE_ID

TRAIN_SCRIPTS is a name string for training script. EXP_ID is a name string for current run. CUDA_DEVICE_ID is cuda device id.

Below are four examples of training CONQUER when

  • it adopts general similarity measure function without shared normalization training objective :
bash scripts/train_general.sh general 0 
  • it adopts general similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_general.sh general_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts disjoint similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_disjoint.sh disjoint_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16
  • it adopts exclusive similarity measure function with three negative videos and extend pool size 1000:
bash scripts/train_sharednorm_exclusive_pretrain.sh exclusive_pretrain_extend1000_neg3 0 \
--use_extend_pool 1000 --neg_video_num 3 --bsz 16 --encoder_pretrain_ckpt_filepath YOUR_DATA_STORAGE_PATH/first_stage_trained_model/model.ckpt

NOTE: The training has randomness when we adopt shared normalization training objective, because we randomly sample negative videos via an adpative pool size. You will witness performance difference each time.

2 . CONQUER inference

After training, you can inference using the saved model on val or test_public set:

bash scripts/inference.sh MODEL_DIR_NAME CUDA_DEVICE_ID

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., tvr-general_extend1000_neg3-*. CUDA_DEVICE_ID is cuda device id.

By default, this code evaluates all the 3 tasks (VCMR, SVMR, VR), you can change this behavior by appending option, e.g. --tasks VCMR VR where only VCMR and VR are evaluated.

Below is one example of inference CONQUER which produce the best performance shown in paper.

2.1. Download the trained model tvr-conquer_general_paper_performance.tar.gz (173 MB). After downloading the trained model, extract it to the current directory:

tar zxvf tvr-conquer_general_paper_performance.tar.gz

You should be able to see results/tvr-conquer_general_paper_performance under the current directory.

2.2. Perform inference on validation split

bash scripts/inference.sh tvr-conquer_general_paper_performance 0 --nms_thd 0.7

We use non-maximum suppression (NMS) and set the threshold as 0.7, because NMS can contribute to a higher [email protected] and [email protected] score empirically.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{hou2020conquer,
  title={CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval},
  author={Zhijian, Hou and  Chong-Wah, Ngo and Wing-Kwong Chan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgement

This code borrowed components from the following projects: TVRetrieval, HERO, HuggingFace, MMT, MME. We thank the authors for open-sourcing these great projects!

Contact

zjhou3-c [at] my.cityu.edu.hk

Owner
Hou zhijian
A PH.D student
Hou zhijian
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
Generate images from texts. In Russian

ruDALL-E Generate images from texts pip install rudalle==1.1.0rc0 🤗 HF Models: ruDALL-E Malevich (XL) ruDALL-E Emojich (XL) (readme here) ruDALL-E S

AI Forever 1.6k Dec 31, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
implicit displacement field

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
Auto White-Balance Correction for Mixed-Illuminant Scenes

Auto White-Balance Correction for Mixed-Illuminant Scenes Mahmoud Afifi, Marcus A. Brubaker, and Michael S. Brown York University Video Reference code

Mahmoud Afifi 47 Nov 26, 2022
Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

Deformable Butterfly: A Highly Structured and Sparse Linear Transform DeBut Advantages DeBut generalizes the square power of two butterfly factor matr

Rui LIN 8 Jun 10, 2022
Our solution for SSN Invente 2021's Hackathon

Our solution for SSN Invente 2021's Hackathon. To help maitain godowns in a pristine and safe condition using raspberry pi.

1 Jan 12, 2022
Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

TGIN Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction". Files in the folder dataset/ electr

Alibaba 21 Dec 21, 2022
Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

680 Final Project: BicycleGAN Haoran Tang Instructions 1. Training To train the network, please run train.py. Change hyper-parameters and folder paths

Haoran Tang 0 Apr 22, 2022
A testcase generation tool for Persistent Memory Programs.

PMFuzz PMFuzz is a testcase generation tool to generate high-value tests cases for PM testing tools (XFDetector, PMDebugger, PMTest and Pmemcheck) If

Systems Research at ShiftLab 14 Jul 24, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023
GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration Stefan Abi-Karam*, Yuqi He*, Rishov Sarkar*, Lakshmi Sathidevi, Zihang Qiao, Co

Sharc-Lab 19 Dec 15, 2022
Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience This repository is the official implementation of [https://www.bi

Eulerlab 6 Oct 09, 2022
git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]

LieTorch: Tangent Space Backpropagation Introduction The LieTorch library generalizes PyTorch to 3D transformation groups. Just as torch.Tensor is a m

Princeton Vision & Learning Lab 482 Jan 06, 2023
This repository contains the source code for the paper First Order Motion Model for Image Animation

!!! Check out our new paper and framework improved for articulated objects First Order Motion Model for Image Animation This repository contains the s

13k Jan 09, 2023
NNR conformation conditional and global probabilities estimation and analysis in peptides or proteins fragments

NNR and global probabilities estimation and analysis in peptides or protein fragments This module calculates global and NNR conformation dependent pro

0 Jul 15, 2021
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning Preprocess file of the dataset used in implicit sub-populations: (Demographic groups

<a href=[email protected]"> 4 Oct 14, 2022
Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties 8.11.2021 Andrij Vasylenko I

Leverhulme Research Centre for Functional Materials Design 4 Dec 20, 2022