The SVO-Probes Dataset for Verb Understanding

This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object understanding in image--language models. This benchmark provides two positive and negative images for a given sentence. The negative image differs from the positive one with respect to either subject, verb, or object. Given a sentence, we test if a model can correctly classify both positive and negative images.

For a detailed description of our benchmark, please see the paper Probing Image–Language Transformers for Verb Understanding. Please cite this paper if you use the SVO-Probes benchmark in your work.

Files

svo_probes.csv: our raw data. Each row in the dataset consists of two <sentence,positive-image> and <sentence,negative-image> pairs. Each image is identified by a url and a unique id: pos_image_id (pos_url) or neg_image_id (neg_url) to mark the positive and negative images, respectively. Each image is also associated with subject-verb-object triplets (pos_triplet or neg_triplet) that can be seen in the image. The subj_neg, verb_neg, obj_neg columns specify the type of the negative: for example, subj_neg is True if the negative example is a subject negative.
image_urls.txt: a list of image urls used in our benchmark.
A Colab to analyze pre-trained models on SVO-Probes.

Disclaimer

This is not an official Google product. The SVO-Probes benchmark is created solely for research purposes and is not intended to be used in products. The images in our benchmark are retrieved from the Google Image Search; we expect our images to reflect distributional properties and biases similar to those returned by the Google Image Search API. Furthermore, our dataset is designed to have a similar vocabulary to the Conceptual Captions dataset so we expect our <Subject, Verb, Object> triplets to reflect biases in the Conceptual Captions.

License

The data is made available under the terms of the Creative Commons Attribution 4.0 International Public License (CC BY 4.0). You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode")

If you have concerns or comments about the benchmark, please contact [email protected] and [email protected].

The SVO-Probes Dataset for Verb Understanding

Related tags

Overview

The SVO-Probes Dataset for Verb Understanding

Files

Disclaimer

License

Owner

DeepMind

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

GPT-3: Language Models are Few-Shot Learners

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Snips Python library to extract meaning from text

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Some embedding layer implementation using ivy library

VD-BERT: A Unified Vision and Dialog Transformer with BERT

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

Production First and Production Ready End-to-End Keyword Spotting Toolkit

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

State of the Art Natural Language Processing

Athena is an open-source implementation of end-to-end speech processing engine.

A high-level Python library for Quantum Natural Language Processing

Under the hood working of transformers, fine-tuning GPT-3 models, DeBERTa, vision models, and the start of Metaverse, using a variety of NLP platforms: Hugging Face, OpenAI API, Trax, and AllenNLP

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

Voilà turns Jupyter notebooks into standalone web applications