System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Overview

Validating Simulations of User Query Variants

This repository contains the scripts of the experiments and evaluations, simulated queries, as well as the figures of:

Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. In Proceedings of the 44th European Conference on IR Research, ECIR 2022.

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.

Directory overview

Directory Description
config/ Contains configuration files for the query simulations, experiments, and evaluations.
data/ Contains (intermediate) output data of the simulations and experiments as well as the figures of the paper.
eval/ Contains scripts of the experiments and evaluations.
sim/ Contains scripts of the query simulations.

Setup

  1. Install Anserini and index Core17 (The New York Times Annotated Corpus) according to the regression guide:
anserini/target/appassembler/bin/IndexCollection \
    -collection NewYorkTimesCollection \
    -input /path/to/core17/ \
    -index anserini/indexes/lucene-index.core17 \
    -generator DefaultLuceneDocumentGenerator \
    -threads 4 \
    -storePositions \
    -storeDocvectors \
    -storeRaw \
    -storeContents \
    > anserini/logs/log.core17 &
  1. Install the required Python packages:
pip install -r requirements.txt

Query simulation

In order to prepare the language models and simulate the queries, the scripts have to executed in the order shown in the following table. All of the outputs can be found in the data/ directory. For the sake of better code readability the names of the query reformulation strategies have been mapped: S1S1; S2S2; S2'S3; S3S4; S3'S5; S4S6; S4'S7; S4''S8. The names of the scripts and output files comply with this name mapping.

Script Description Output files
sim/make_background.py Make the background language model form all index terms of Core17. The background model is required for Controlled Query Generation (CQG) by Jordan et al. data/lm/background.csv
sim/make_cqg.py Make the CQG language models with different parameters of lambda from 0.0 to 1.0. data/lm/cqg.json
sim/simulate_queries_s12345.py Simulate TTS and KIS queries with strategies S1 to S3' data/queries/s12345.csv
sim/simulate_queries_s678.py Simulate TTS and KIS queries with strategies S4 to S4'' data/queries/s678.csv

Experimental evaluation and results

In order to reproduce the experiments of the study, the scripts have to executed in the order shown in the following table.

Script Description Output files Reproduction of ...
eval/arp.py, eval/arp_first.py, eval/arp_max.py Retrieval performance: Evaluate the Average Retrieval Performance (ARP). data/experimental_results/arp.csv, data/experimental_results/arp_first.csv, data/experimental_results/arp_max.csv Tab. A.1
eval/rmse_s12345.py, eval/rmse_s678.py Retrieval performance: Evaluate the Root-Mean-Square-Error (RMSE). data/experimental_results/rmse_map.csv, data/experimental_results/rmse_ndcg.csv, data/experimental_results/rmse_p1000.csv, data/experimental_results/rmse_uqv_vs_s12345_kis_ndcg.csv, data/experimental_results/rmse_uqv_vs_s12345_tts_ndcg.csv, data/figures/rmse_map.pdf, data/figures/rmse_ndcg.pdf, data/figures/rmse_p1000.pdf, data/figures/rmse_uqv_vs_s12345_kis_ndcg.pdf, data/figures/rmse_uqv_vs_s12345_tts_ndcg.pdf Fig. A.1, Fig. 1
eval/t-test.py Retrieval performance: Evaluate the p-values of paired t-tests. data/experimental_results/ttest.csv, data/figures/ttest.pdf Fig. A.2
eval/system_orderings.py Shared task utility: Evaluate Kendall's tau between relative system orderings. data/experimental_results/system_orderings.csv, data/figures/system_orderings.pdf Fig. 2 (left)
eval/sdcg.py Effort and effect: Evaluate the Session Discounted Cumulative Gain (sDCG). data/experimental_results/sdcg_3queries.csv, data/experimental_results/sdcg_5queries.csv, data/experimental_results/sdcg_10queries.csv, data/figures/sdcg_3queries.pdf, data/figures/sdcg_5queries.pdf, data/figures/sdcg_10queries.pdf Fig. 3 (top)
eval/economic.py Effort and effect: Evaluate tradeoffs between number of queries and browsing depth by isoquants. data/experimental_results/economic0.3.csv, data/experimental_results/economic0.4.csv, data/experimental_results/economic0.5.csv, data/figures/economic0.3.pdf, data/figures/economic0.4.pdf, data/figures/economic0.5.pdf Fig. 3 (bottom)
eval/jaccard_similarity.py Query term similarity: Evaluate query term similarities. data/experimental_results/jacc.csv, data/figures/jacc.pdf Fig. 2 (right)
Owner
IR Group at Technische Hochschule Köln
IR Group at Technische Hochschule Köln
Entity-Based Knowledge Conflicts in Question Answering.

Entity-Based Knowledge Conflicts in Question Answering Run Instructions | Paper | Citation | License This repository provides the Substitution Framewo

Apple 35 Oct 19, 2022
Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Training Reproduce of PSPNet. (Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with support

Li Xuhong 126 Jul 13, 2022
This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF).

VaxNeRF Paper | Google Colab This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF). This codebase is implemented using JAX, buildin

naruya 132 Nov 21, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 01, 2023
NeROIC: Neural Object Capture and Rendering from Online Image Collections

NeROIC: Neural Object Capture and Rendering from Online Image Collections This repository is for the source code for the paper NeROIC: Neural Object C

Snap Research 647 Dec 27, 2022
[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

ShapeInversion Paper Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, Chen Change Loy "Unsupervised 3D

100 Dec 22, 2022
Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

Jamie J. Seol 287 Dec 14, 2022
The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

386 Jan 01, 2023
Additional functionality for use with fastai’s medical imaging module

fmi Adding additional functionality to fastai's medical imaging module To learn more about medical imaging using Fastai you can view my blog Install g

14 Oct 31, 2022
This is a deep learning-based method to segment deep brain structures and a brain mask from T1 weighted MRI.

DBSegment This tool generates 30 deep brain structures segmentation, as well as a brain mask from T1-Weighted MRI. The whole procedure should take ~1

Luxembourg Neuroimaging (Platform OpNeuroImg) 2 Oct 25, 2022
A Dataset for Direct Quotation Extraction and Attribution in News Articles.

DirectQuote - A Dataset for Direct Quotation Extraction and Attribution in News Articles DirectQuote is a corpus containing 19,760 paragraphs and 10,3

THUNLP-MT 9 Sep 23, 2022
Draw like Bob Ross using the power of Neural Networks (With PyTorch)!

Draw like Bob Ross using the power of Neural Networks! (+ Pytorch) Learning Process Visualization Getting started Install dependecies Requires python3

Kendrick Tan 116 Mar 07, 2022
A curated list of references for MLOps

A curated list of references for MLOps

Larysa Visengeriyeva 9.3k Jan 07, 2023
General Assembly Capstone: NBA Game Predictor

Project 6: Predicting NBA Games Problem Statement Can I predict the results of NBA games from the back-half of a season from the opening half of the s

Adam Muhammad Klesc 1 Jan 14, 2022
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

Kaichun Mo 26 Nov 04, 2022
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 09, 2022
Bio-OFC gym implementation and Gym-Fly environment

Bio-OFC gym implementation and Gym-Fly environment This repository includes the gym compatible implementation of the Bio-OFC algorithm from the paper

Siavash Golkar 1 Nov 16, 2021
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

210 Dec 18, 2022
Official PyTorch implementation of GDWCT (CVPR 2019, oral)

This repository provides the official code of GDWCT, and it is written in PyTorch. Paper Image-to-Image Translation via Group-wise Deep Whitening-and-

WonwoongCho 135 Dec 02, 2022
Loopy belief propagation for factor graphs on discrete variables, in JAX!

PGMax implements general factor graphs for discrete probabilistic graphical models (PGMs), and hardware-accelerated differentiable loopy belief propagation (LBP) in JAX.

Vicarious 62 Dec 23, 2022