VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Last update: Dec 04, 2022

Related tags

Overview

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Introduction

Official repository for VLG-Net: Video-Language Graph Matching Networks for Video Grounding. [ArXiv Preprint]

The paper is accepted to the first edition fo the ICCV workshop: AI for Creative Video Editing and Understanding (CVEU).

Installation

Clone the repository and move to folder:

git clone https://github.com/Soldelli/VLG-Net.git
cd VLG-Net

Install environmnet:

conda env create -f environment.yml

If installation fails, please follow the instructions in file doc/environment.md (link).

Data

Download the following resources and extract the content in the appropriate destination folder. See table.

Resource	Download Link	File Size	Destination Folder
StandfordCoreNLP-4.0.0	link	(~0.5GB)	`./datasets/`
TACoS	link	(~0.5GB)	`./datasets/`
ActivityNet-Captions	link	(~29GB)	`./datasets/`
DiDeMo	link	(~13GB)	`./datasets/`
GCNeXt warmup	link	(~0.1GB)	`./datasets/`
Pretrained Models	link	(~0.1GB)	`./models/`

The folder structure should be as follows:

.
├── configs
│
├── datasets
│   ├── activitynet1.3
│   │    ├── annotations
│   │    └── features
│   ├── didemo
│   │    ├── annotations
│   │    └── features
│   ├── tacos
│   │    ├── annotations
│   │    └── features
│   ├── gcnext_warmup
│   └── standford-corenlp-4.0.0
│
├── doc
│
├── lib
│   ├── config
│   ├── data
│   ├── engine
│   ├── modeling
│   ├── structures
│   └── utils
│
├── models
│   ├── activitynet
│   └── tacos
│
├── outputs
│
└── scripts

Training

Copy paste the following commands in the terminal.

Load environment:

conda activate vlg

For ActivityNet-Captions dataset, run:

python train_net.py --config-file configs/activitynet.yml OUTPUT_DIR outputs/activitynet

For TACoS dataset, run:

python train_net.py --config-file configs/tacos.yml OUTPUT_DIR outputs/tacos

Evaluation

For simplicity we provide scripts to automatically run the inference on pretrained models. See script details if you want to run inference on a different model.

Load environment:

conda activate vlg

Then run one of the following scripts to launch the evaluation.

For ActivityNet-Captions dataset, run:

    bash scripts/activitynet.sh

For TACoS dataset, run:

    bash scripts/tacos.sh

Expected results:

After cleaning the code and fixing a couple of minor bugs, performance changed (slightly) with respect to reported numbers in the paper. See below table.

ActivityNet	[email protected]	[email protected]	[email protected]	[email protected]
Paper	46.32	29.82	77.15	63.33
Current	46.32	29.79	77.19	63.36

TACoS	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
Paper	57.21	45.46	34.19	81.80	70.38	56.56
Current	57.16	45.56	34.14	81.48	70.13	56.34

Citation

If any part of our paper and code is helpful to your work, please cite with:

@inproceedings{soldan2021vlg,
  title={VLG-Net: Video-Language Graph Matching Network for Video Grounding},
  author={Soldan, Mattia and Xu, Mengmeng and Qu, Sisi and Tegner, Jesper and Ghanem, Bernard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3224--3234},
  year={2021}
}

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Related tags

Overview

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Introduction

Installation

Data

Training

Evaluation

Expected results:

Citation

Owner

Mattia Soldan

[SIGMETRICS 2022] One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

A repo with study material, exercises, examples, etc for Devnet SPAUTO

Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)

This repository contains PyTorch code for Robust Vision Transformers.

Implementing DropPath/StochasticDepth in PyTorch

A supplementary code for Editable Neural Networks, an ICLR 2020 submission.

CM building dataset Timisoara

基于Flask开发后端、VUE开发前端框架，在WEB端部署YOLOv5目标检测模型

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Code of the paper "Shaping Visual Representations with Attributes for Few-Shot Learning (ASL)".

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

A deep learning library that makes face recognition efficient and effective

Miscellaneous and lightweight network tools

People Interaction Graph

A flexible submap-based framework towards spatio-temporally consistent volumetric mapping and scene understanding.

Yas CRNN model training - Yet Another Genshin Impact Scanner

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

Back to Basics: Efficient Network Compression via IMP