Code, Models and Datasets for OpenViDial Dataset

Overview

OpenViDial

This repo contains downloading instructions for the OpenViDial dataset in 《OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts》 along with the code to reproduce results in the paper (See Section Baselines).

Introduction

When humans converse, what a speaker will say next significantly depends on what he sees. OpenViDial is a largescale multi-module dialogue dataset for this purpose. The dialogue turns and visual contexts are extracted from movies and TV series, where each dialogue turn is paired with the corresponding visual context in which it takes place. OpenViDial contains a total number of 1.1 million dialogue turns, and thus 1.1 million visual contexts stored in images.

The following are two short conversations where visual contexts are crucial.

Detailed statistics for OpenViDial

Attribute value
Number of turns 1.1M
Number of images 1.1M
Vocab size before BPE 70K
Vocab size after BPE 30K
Average length of each episode 14
Average length of each turn 7.6

Download the Dataset

The main folder origin_dir contains training/valid/test sets, each of which is made up by the following files:

├──origin_dir
      └── train.dialogue.jsonl // each line is an episode of dialogue, which a list of IDs.    
      └── train.origin.txt // each line corresponds to a dialogue text utterence, with the ID being its line number (staring with 0).
      └── train_images // containing images (visual contexts) in which the text utterence take place, with ID being the image filename (0,1,2, etc)
            └── 0.jpg
            └── 1.jpg
            └── ...
      └── valid.* (i.e., valid.dialogue.jsonl, valid.origin.txt, valid_images)
      └── test.*  (i.e., test.dialogue.jsonl, test.origin.txt, test_images)

If you'd like to take a glance at the a sample of the dataset instead of downloading the full dataset, we provide a data sample here

Data download:

  1. Download [train|valid|test].origin.txt and [train|valid|test].dialogue.jsonl here
  2. Download test_images (~ 20G) here
  3. Download valid_images (~ 20G) here
  4. Download train_images: Since train_images is too big (~ 170G), we split it to 11 zip files (each of which is 17G). Download seperate files zip_train here. Then download and run cat.sh here to include all files in the same directory.
  5. Move all files to origin_dir.

Models

We proposed three models for this dataset. Please refer to the paper for details:

  • Model #1 - NoVisual: use only dialog texts without visual information
  • Model #2 - CoarseVisual: use texts and a pretrained ResNet50 on ImageNet to compute 1000-d feature from each picture
  • Model #3 - FineVisual: use texts and a pretrained Faster R-CNN on Genome to compute 2048-d * K objects features from each picture

Faster R-CNN is an object detection framework. The detection sample and attention over objects during text decoding is shown below.

Requirements

  • python >= 3.6
  • pip install -r requirements.txt

Preprocess directory structure

preprocessed_data_dir is a directory that contains all the preprocessed files (text, image feature mmap, offsets, etc.) generated from origin_data_dir and we use them in training models. The directory structure is shown below.

Note: every train* file or directory should have a 'valid' and a 'test' counterpart, we ignore them below for simplicity.

├──preprocessed_data_dir
      └── train.features.mmap  // numpy mmap array file of shape [num_sents, 1000], each row is a 1000-d ResNet-50 feature
      └── train.objects.mmap  // numpy mmap array file of shape [num_sents, 20, 2048],  faster-rcnn object feature file, each row contain 20 objects feature, which is 2048-d
      └── train.objects_mask.mmap  // numpy mmap array file of shape [num_sents, 20],  faster-rcnn mask file, each row contain 20 objects mask, 1 for valid, 0 for mask
      └── train.offsets.npy  // numpy array file of shape [num_episodes], each item is the offsets of one episode
      └── train.sent_num.npy // numpy array file of shape [num_episodes], each item is the sentence number of one episode

Preprocess text data

We use Moses Tokenizer to tokenize texts and generate offsets arrays: bash ./scripts/preprocess_video_data.sh and followed with byte-pair-encoding and fairseq-preprocess binarization: bash ./scripts/preprocess_text_data.sh

Note: You need to change DATA_DIR, ORIGIN_DIR, OUTPUT_DIR to your own path

Prepare pre-computed CNN features and Faster-RCNN features

Download CNN-pooling features(Used for Model #2 - CoarseVisual)

Preprocessed ResNet50 features (*.features.mmap) (~4G) can be downloaded from here and move them under preprocessed_data_dir/

Download Faster R-CNN features(Used for Model #3 - FineVisual)

Preprocessed Faster R-CNN objects features (*objects.mmap, *objects_mask.mmap) (~160G) can be downloaded from here then move them under preprocessed_data_dir/

Since file train.objects.mmap is too large(100G+), we splitted it to many small pieces like train.objects.mmap.split*, and you need another step to merge all those files together: cat * train.objects.mmap.split* >train.objects.mmap

(Optional) Extract features on your own

If you want to extract some feature on your own, or you'd like to know details of extracting visual features, see video_dialogue_model/extract_features/extract_features.md

Train and Evaluate Model #1 - NoVisual

bash scripts/reproduce_baselines/text_only.sh will train and evaluate NoVisual, Remember to change MODEL_DIR and DATA_DIR for your setup

Train and Evaluate Model #2 - CoarseVisual

bash scripts/reproduce_baselines/text_and_img_feature.sh will train and evaluate CoarseVisual. Remember to change MODEL_DIR and DATA_DIR for your setup

Train and Evaluate Model #3 - FineVisual

bash scripts/reproduce_baselines/text_and_img_objects.sh will train and evaluate FineVisual, Remember to change MODEL_DIR and DATA_DIR for your setup

Other Statistics

  • get length/diversity/stopwords% statistics of system output: train/stats.py

Model benchmark

Model BLEU-1 BLEU-2 BLEU-4 Stopword% Dis-1 Dis-2 Dis-3 Dis-4
1-NV 14.01 3.98 1.07 58.1% 0.0091 0.0355 0.0682 0.1018
2-CV 14.58 4.35 1.14 54.2% 0.0108 0.0448 0.0915 0.1465
3-FV 15.61 4.71 1.22 52.9% 0.0118 0.0502 0.1082 0.1778
NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

J Brave 2 Sep 01, 2022
Differentiable Abundance Matching With Python

shamnet Differentiable Stellar Population Synthesis Installation You can install shamnet with pip. Installation dependencies are numpy, jax, corrfunc,

5 Dec 17, 2021
Content shared at DS-OX Meetup

Streamlit-Projects Streamlit projects available in this repo: An introduction to Streamlit presented at DS-OX (Feb 26, 2020) meetup Streamlit 101 - Ja

Arvindra 69 Dec 23, 2022
Face detection using deep learning.

Face Detection Docker Solution Using Faster R-CNN Dockerface is a deep learning face detector. It deploys a trained Faster R-CNN network on Caffe thro

Nataniel Ruiz 181 Dec 19, 2022
Markov Attention Models

Introduction This repo contains code for reproducing the results in the paper Graphical Models with Attention for Context-Specific Independence and an

Vicarious 0 Dec 09, 2021
Optimizing Deeper Transformers on Small Datasets

DT-Fixup Optimizing Deeper Transformers on Small Datasets Paper published in ACL 2021: arXiv Detailed instructions to replicate our results in the pap

16 Nov 14, 2022
Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting Pytorch implementation for the paper "JOKR: Joint Keypoint Repres

45 Dec 25, 2022
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

VITA 68 Sep 05, 2022
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 01, 2022
Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US simulation

AutomaticUSnavigation Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US

Cesare Magnetti 6 Dec 05, 2022
Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.

Few-Shot-Intent-Detection Few-Shot-Intent-Detection is a repository designed for few-shot intent detection with/without Out-of-Scope (OOS) intents. It

Jian-Guo Zhang 73 Dec 26, 2022
A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

Ayushman Dash 93 Aug 04, 2022
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Knowledge-enhanced Contrastive Learning (KCL) Molecular Contrastive Learning with Chemical Element Knowledge Graph [ AAAI 2022 ]. We construct a Chemi

Fangyin 58 Dec 26, 2022
Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)

Applied Machine Learning (Cornell CS5785, Fall 2021) This repo contains executable course notes and slides for the Applied ML course at Cornell and Co

Volodymyr Kuleshov 103 Dec 31, 2022
Everything's Talkin': Pareidolia Face Reenactment (CVPR2021)

Everything's Talkin': Pareidolia Face Reenactment (CVPR2021) Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, and Ran He [Paper], [Video

71 Dec 21, 2022
TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

Ivana Balazevic 296 Dec 06, 2022
A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Library | Paper | Slack We released two versions of OAG-BERT in CogDL package. OAG-BERT is a heterogeneous entity-augmented academic language model wh

THUDM 58 Dec 17, 2022
TagLab: an image segmentation tool oriented to marine data analysis

TagLab: an image segmentation tool oriented to marine data analysis TagLab was created to support the activity of annotation and extraction of statist

Visual Computing Lab - ISTI - CNR 49 Dec 29, 2022
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022
SCU OlympicsRunning Baseline

Competition 1v1 running Environment check details in Jidi Competition RLChina2021智能体竞赛 做出的修改: 奖励重塑:修改了环境,重新设置了奖励的分配,使得奖励组成不只有零和博弈,还有探索环境的奖励。 算法微调:修改了官

ZiSeoi Wong 2 Nov 23, 2021