Multi-query Video Retrieval

This repository contains the code for the paper:

@misc{wang2022multiquery,
      title={Multi-query Video Retrieval}, 
      author={Zeyu Wang and Yu Wu and Karthik Narasimhan and Olga Russakovsky},
      year={2022},
      eprint={2201.03639},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Data Preparation

Download raw videos for MSR-VTT, MSVD and VATEX, and put them into data/{dataset}/raw_videos folder.
Run the script data/extract_frames.sh to extract frames from raw videos.

The resulting data folder structures like this:

├── data
    ├── msrvtt
        ├── msrvtt_train.json
        ├── msrvtt_test.json
        ├── msrvtt_test_varying_query_sample_1-20.json
        ├── raw_videos
            ├── video0.mp4
            ├── ...
        ├── extracted_frames
            ├── video0.mp4
                ├── 0.jpg
                ├── ...
            ├── ...
    ├── msvd
        ├── ...
    ├── vatex
        ├── ...

For Frozen model, download the pretrained checkpoint provided by the original authors here, and put into record/pretrained folder.

Training

Run command: python train.py -c configs/{config_path}

Evaluation

Run command: python evaluate.py -c configs/{config_path}

Acknowledgements

The structure of this repository is based on https://github.com/victoresque/pytorch-template. Some of the code are adpated from https://github.com/m-bain/frozen-in-time and https://github.com/ArrowLuo/CLIP4Clip.

Multi-query Video Retreival

Related tags

Overview

Multi-query Video Retrieval

Data Preparation

Training

Evaluation

Acknowledgements

Owner

Princeton Visual AI Lab

Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

It helps user to learn Pick-up lines and share if he has a better one

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

PyTorch implementation of saliency map-aided GAN for Auto-demosaic+denosing

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

The best solution of the Weather Prediction track in the Yandex Shifts challenge

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

Using pretrained language models for biomedical knowledge graph completion.

DeepFaceLive - Live Deep Fake in python, Real-time face swap for PC streaming or video calls

High-quality implementations of standard and SOTA methods on a variety of tasks.

Robot Servers and Server Manager software for robo-gym

Implementation of paper "Self-supervised Learning on Graphs:Deep Insights and New Directions"

RuleBERT: Teaching Soft Rules to Pre-Trained Language Models

Meta graph convolutional neural network-assisted resilient swarm communications

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Simulation of self-focusing of laser beams in condensed media

Facial Expression Detection In The Realtime

Evaluation and Benchmarking of Speech Super-resolution Methods

Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images