Code, Models and Datasets for OpenViDial Dataset

Overview

OpenViDial

This repo contains downloading instructions for the OpenViDial dataset in 《OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts》 along with the code to reproduce results in the paper (See Section Baselines).

Introduction

When humans converse, what a speaker will say next significantly depends on what he sees. OpenViDial is a largescale multi-module dialogue dataset for this purpose. The dialogue turns and visual contexts are extracted from movies and TV series, where each dialogue turn is paired with the corresponding visual context in which it takes place. OpenViDial contains a total number of 1.1 million dialogue turns, and thus 1.1 million visual contexts stored in images.

The following are two short conversations where visual contexts are crucial.

Detailed statistics for OpenViDial

Attribute value
Number of turns 1.1M
Number of images 1.1M
Vocab size before BPE 70K
Vocab size after BPE 30K
Average length of each episode 14
Average length of each turn 7.6

Download the Dataset

The main folder origin_dir contains training/valid/test sets, each of which is made up by the following files:

├──origin_dir
      └── train.dialogue.jsonl // each line is an episode of dialogue, which a list of IDs.    
      └── train.origin.txt // each line corresponds to a dialogue text utterence, with the ID being its line number (staring with 0).
      └── train_images // containing images (visual contexts) in which the text utterence take place, with ID being the image filename (0,1,2, etc)
            └── 0.jpg
            └── 1.jpg
            └── ...
      └── valid.* (i.e., valid.dialogue.jsonl, valid.origin.txt, valid_images)
      └── test.*  (i.e., test.dialogue.jsonl, test.origin.txt, test_images)

If you'd like to take a glance at the a sample of the dataset instead of downloading the full dataset, we provide a data sample here

Data download:

  1. Download [train|valid|test].origin.txt and [train|valid|test].dialogue.jsonl here
  2. Download test_images (~ 20G) here
  3. Download valid_images (~ 20G) here
  4. Download train_images: Since train_images is too big (~ 170G), we split it to 11 zip files (each of which is 17G). Download seperate files zip_train here. Then download and run cat.sh here to include all files in the same directory.
  5. Move all files to origin_dir.

Models

We proposed three models for this dataset. Please refer to the paper for details:

  • Model #1 - NoVisual: use only dialog texts without visual information
  • Model #2 - CoarseVisual: use texts and a pretrained ResNet50 on ImageNet to compute 1000-d feature from each picture
  • Model #3 - FineVisual: use texts and a pretrained Faster R-CNN on Genome to compute 2048-d * K objects features from each picture

Faster R-CNN is an object detection framework. The detection sample and attention over objects during text decoding is shown below.

Requirements

  • python >= 3.6
  • pip install -r requirements.txt

Preprocess directory structure

preprocessed_data_dir is a directory that contains all the preprocessed files (text, image feature mmap, offsets, etc.) generated from origin_data_dir and we use them in training models. The directory structure is shown below.

Note: every train* file or directory should have a 'valid' and a 'test' counterpart, we ignore them below for simplicity.

├──preprocessed_data_dir
      └── train.features.mmap  // numpy mmap array file of shape [num_sents, 1000], each row is a 1000-d ResNet-50 feature
      └── train.objects.mmap  // numpy mmap array file of shape [num_sents, 20, 2048],  faster-rcnn object feature file, each row contain 20 objects feature, which is 2048-d
      └── train.objects_mask.mmap  // numpy mmap array file of shape [num_sents, 20],  faster-rcnn mask file, each row contain 20 objects mask, 1 for valid, 0 for mask
      └── train.offsets.npy  // numpy array file of shape [num_episodes], each item is the offsets of one episode
      └── train.sent_num.npy // numpy array file of shape [num_episodes], each item is the sentence number of one episode

Preprocess text data

We use Moses Tokenizer to tokenize texts and generate offsets arrays: bash ./scripts/preprocess_video_data.sh and followed with byte-pair-encoding and fairseq-preprocess binarization: bash ./scripts/preprocess_text_data.sh

Note: You need to change DATA_DIR, ORIGIN_DIR, OUTPUT_DIR to your own path

Prepare pre-computed CNN features and Faster-RCNN features

Download CNN-pooling features(Used for Model #2 - CoarseVisual)

Preprocessed ResNet50 features (*.features.mmap) (~4G) can be downloaded from here and move them under preprocessed_data_dir/

Download Faster R-CNN features(Used for Model #3 - FineVisual)

Preprocessed Faster R-CNN objects features (*objects.mmap, *objects_mask.mmap) (~160G) can be downloaded from here then move them under preprocessed_data_dir/

Since file train.objects.mmap is too large(100G+), we splitted it to many small pieces like train.objects.mmap.split*, and you need another step to merge all those files together: cat * train.objects.mmap.split* >train.objects.mmap

(Optional) Extract features on your own

If you want to extract some feature on your own, or you'd like to know details of extracting visual features, see video_dialogue_model/extract_features/extract_features.md

Train and Evaluate Model #1 - NoVisual

bash scripts/reproduce_baselines/text_only.sh will train and evaluate NoVisual, Remember to change MODEL_DIR and DATA_DIR for your setup

Train and Evaluate Model #2 - CoarseVisual

bash scripts/reproduce_baselines/text_and_img_feature.sh will train and evaluate CoarseVisual. Remember to change MODEL_DIR and DATA_DIR for your setup

Train and Evaluate Model #3 - FineVisual

bash scripts/reproduce_baselines/text_and_img_objects.sh will train and evaluate FineVisual, Remember to change MODEL_DIR and DATA_DIR for your setup

Other Statistics

  • get length/diversity/stopwords% statistics of system output: train/stats.py

Model benchmark

Model BLEU-1 BLEU-2 BLEU-4 Stopword% Dis-1 Dis-2 Dis-3 Dis-4
1-NV 14.01 3.98 1.07 58.1% 0.0091 0.0355 0.0682 0.1018
2-CV 14.58 4.35 1.14 54.2% 0.0108 0.0448 0.0915 0.1465
3-FV 15.61 4.71 1.22 52.9% 0.0118 0.0502 0.1082 0.1778
CMP 414/765 course repository for Spring 2022 semester

CMP414/765: Artificial Intelligence Spring2021 This is the GitHub repository for course CMP 414/765: Artificial Intelligence taught at The City Univer

ch00226855 4 May 16, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 104 Jan 05, 2023
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Convolutional Hough Matching Networks This is the implementation of the paper "Convolutional Hough Matching Network" by J. Min and M. Cho. Implemented

Juhong Min 70 Nov 22, 2022
Split your patch similarly to `git add -p` but supporting multiple buckets

split-patch.py This is git add -p on steroids for patches. Given a my.patch you can run ./split-patch.py my.patch You can choose in which bucket to p

102 Oct 06, 2022
PyTorch implementation of Octave Convolution with pre-trained Oct-ResNet and Oct-MobileNet models

octconv.pytorch PyTorch implementation of Octave Convolution in Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octa

Duo Li 273 Dec 18, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Sequential-GAM Pipeline code for Sequential-GAM(Genome Architecture Mapping). mapping whole_preprocess.sh include the whole processing of mapping. usa

3 Nov 03, 2022
A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Unofficial PyTorch implementation of Luna: Linear Unified Nested Attention The quadratic computational and memory complexities of the Transformer’s at

Soohwan Kim 32 Nov 07, 2022
WSDM‘2022: Knowledge Enhanced Sports Game Summarization

Knowledge Enhanced Sports Game Summarization Cooming Soon! :) Data will be released after approval process. Code will be published once the author of

Jiaan Wang 14 Jul 13, 2022
This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

Miaoyun Zhao 43 Dec 27, 2022
Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Imaginaire Docs | License | Installation | Model Zoo Imaginaire is a pytorch library that contains optimized implementation of several image and video

NVIDIA Research Projects 3.6k Dec 29, 2022
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

124 Jan 06, 2023
Constructing Neural Network-Based Models for Simulating Dynamical Systems

Constructing Neural Network-Based Models for Simulating Dynamical Systems Note this repo is work in progress prior to reviewing This is a companion re

Christian Møldrup Legaard 21 Nov 25, 2022
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

Yuqing Song 61 Oct 11, 2022
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs SMORE is a a versatile framework that scales multi-hop query emb

Google Research 135 Dec 27, 2022
MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts (ICLR 2022)

MetaShift: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts This repo provides the PyTorch source code of our paper: Me

88 Jan 04, 2023
PaSST: Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

165 Dec 26, 2022
Consensus score for tripadvisor

ContripScore ContripScore is essentially a score that combines an Internet platform rating and a consensus rating from sentiment analysis (For instanc

Pepe 1 Jan 13, 2022