Data for "Driving the Herd: Search Engines as Content Influencers" paper

Overview

herding_data

Data for "Driving the Herd: Search Engines as Content Influencers" paper

Dataset description

The collection contains 2250 documents, 30 initial relevant documents (round 0) - located in initial_documents.trectext file. 2100 documents (rounds 1-5) created by competitors. 120 documents are the example documents that were manually promoted in the herding method experiments.

This dataset is divided w.r.t. the different experiments for content effect, described in the paper.

Format: trectext. DOCNO Format: ROUND- - -

Relevance Judgments (qrels):

All documents in the collection were judged for relevance. Annotators were presented with both the title and the description of each TREC topic and were asked to classify a document as relevant if it satisfies the information need stated in the description.

A document judged relevant by less than three annotators was labeled as non-relevant (0). Documents judged relevant by at least three, four or five annotators were labeled as marginally relevant (1), fairly relevant (2) and highly relevant (3), respectively. For each experiment the relevance judgment file has ".rel" suffix.

Quality judgements:

All documents in the collection where judged for quality by five annotators. Annotators were presented with the text of the document and were asked to classify the docuemnt as: (1) Valid, (2) Keyword-stuffed, (3) Spam.

A document is deemed as keyword-stuffed if it contained excessive repetition of words which seemed unnatural or artificially introduced.

A document is considered as spam if its content could not possibly satisfy any information need.

If a document is not spam or keywordstuffed, it is considered as valid. Documents judged valid by at least three, four or five annotators were labeled as marginally high-quality (1), fairly high-quality (2) and highly high-quality (3), respectively. For each experiment the quality judgment file has ".ks" suffix.

Queries

We used 30 of ClueWeb09 queries which can be downloded here: http://trec.nist.gov/data/webmain.html.

Example documents

In the herding method experiment for each query and effect an exapmle document, manifesting the desired content effect, was manually promoted to 1'st place. For each effect the example documents are located at "herding__example_documents.trectext" file. The format of document names is: DOCNO Format: ROUND-00- -EXAMPLEDOC

Subtopic effect experiment

This content effect was tested both in terms of herding and biasing approaches. For each query 2 different subtopics were tested. The subtopics were taken from ClueWeb09 subtopics list. The mapping between qid and the subtopic number which was promoted (and the actual information need manifested by the subtopic) is located at _subtopics_map.txt files (in each relevant directory separetly).

We include relevance judgemnts for each document (competing for a rankings w.r.t a query) w.r.t. to both subtopics promoted for the query. Please note that each document was tested w.r.t. a single subtopic (can be induced by the mapping file) during the experiment. The judgments are for both subtopics for analysis porpuses only. Relevance judgments w.r.t. subtopics name is " _relevance_to_subptopic.rel".

The qrels format is: " ".

Directories

Herding

Document_length_effect

The data contained in this directory is related to the documents created in the document length effect experiment (herding method).

Non_relevance_effect

The data contained in this directory is related to the documents created in the non-relevance effect experiment (herding method).

Query_terms_effect

The data contained in this directory is related to the documents created in the query terms effect experiment (herding method).

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (herding method).

Biasing

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (biasing method).

Control

The data contained in this directory is related to the documents created in the control group. That is, no expore of any kind of manipulation for this group.

Dummies

The data contained in this directory is related to the documents taken from Raifer et al '17 dataset. Dummies with docnos "DUMMY_{0,1}" where shared over all groups.

Control group and biasing groups where filled with DUMMY_2 dummies (in the docno) as well.

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

Cognitive Tools Lab 38 Jan 06, 2023
the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

Isometric Multi-Shape Matching (IsoMuSh) Paper-CVF | Paper-arXiv | Video | Code Citation If you find our work useful in your research, please consider

Maolin Gao 9 Jul 17, 2022
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

63 Oct 17, 2022
An implementation of an abstract algebra for music tones (pitches).

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

Open Music Kit 0 Oct 10, 2022
A proof of concept ai-powered Recaptcha v2 solver

Recaptcha Fullauto I've decided to open source my old Recaptcha v2 solver. My latest version will be opened sourced this summer. I am hoping this proj

Nate 60 Dec 20, 2022
NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows

NeurIPS'21 Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows This repo contains the code for the paper Tractable Densit

Layer6 Labs 4 Dec 12, 2022
Adaptive Attention Span for Reinforcement Learning

Adaptive Transformers in RL Official implementation of Adaptive Transformers in RL In this work we replicate several results from Stabilizing Transfor

100 Nov 15, 2022
Trajectory Extraction of road users via Traffic Camera

Traffic Monitoring Citation The associated paper for this project will be published here as soon as possible. When using this software, please cite th

Julian Strosahl 14 Dec 17, 2022
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

DMT: Dynamic Mutual Training for Semi-Supervised Learning This repository contains the code for our paper DMT: Dynamic Mutual Training for Semi-Superv

Zhengyang Feng 120 Dec 30, 2022
Unofficial implementation of the Involution operation from CVPR 2021

involution_pytorch Unofficial PyTorch implementation of "Involution: Inverting the Inherence of Convolution for Visual Recognition" by Li et al. prese

Rishabh Anand 46 Dec 07, 2022
基于AlphaPose的TensorRT加速

1. Requirements CUDA 11.1 TensorRT 7.2.2 Python 3.8.5 Cython PyTorch 1.8.1 torchvision 0.9.1 numpy 1.17.4 (numpy版本过高会出报错 this issue ) python-package s

52 Dec 06, 2022
A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Movenet.Pytorch Intro MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet fro

Mr.Fire 241 Dec 26, 2022
PyTorch wrapper for Taichi data-oriented class

Stannum PyTorch wrapper for Taichi data-oriented class PRs are welcomed, please see TODOs. Usage from stannum import Tin import torch data_oriented =

86 Dec 23, 2022
Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Introduction Point cloud sequences are irregular and unordered in the spatial dimen

Hehe Fan 63 Dec 09, 2022
Label Hallucination for Few-Shot Classification

Label Hallucination for Few-Shot Classification This repo covers the implementation of the following paper: Label Hallucination for Few-Shot Classific

Yiren Jian 13 Nov 13, 2022
UIUCTF 2021 Public Challenge Repository

UIUCTF-2021-Public UIUCTF 2021 Public Challenge Repository Notes: every challenge folder contains a challenge.yml file in the format for ctfcli, CTFd'

SIGPwny 15 Nov 03, 2022
⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances This repository contains the code for Measuring the Co

Daniel Steinberg 0 Nov 06, 2022
Simulation code and tutorial for BBHnet training data

Simulation Dataset for BBHnet NOTE: OLD README, UPDATE IN PROGRESS We generate simulation dataset to train BBHnet, our deep learning framework for det

0 May 31, 2022
A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

switchnorm 1.7k Dec 26, 2022
PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

PyDeepFakeDet An integrated and scalable library for Deepfake detection research. Introduction PyDeepFakeDet is an integrated and scalable Deepfake de

Junke, Wang 49 Dec 11, 2022