System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Overview

Validating Simulations of User Query Variants

This repository contains the scripts of the experiments and evaluations, simulated queries, as well as the figures of:

Timo Breuer, Norbert Fuhr, and Philipp Schaer. 2022. Validating Simulations of User Query Variants. In Proceedings of the 44th European Conference on IR Research, ECIR 2022.

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.

Directory overview

Directory Description
config/ Contains configuration files for the query simulations, experiments, and evaluations.
data/ Contains (intermediate) output data of the simulations and experiments as well as the figures of the paper.
eval/ Contains scripts of the experiments and evaluations.
sim/ Contains scripts of the query simulations.

Setup

  1. Install Anserini and index Core17 (The New York Times Annotated Corpus) according to the regression guide:
anserini/target/appassembler/bin/IndexCollection \
    -collection NewYorkTimesCollection \
    -input /path/to/core17/ \
    -index anserini/indexes/lucene-index.core17 \
    -generator DefaultLuceneDocumentGenerator \
    -threads 4 \
    -storePositions \
    -storeDocvectors \
    -storeRaw \
    -storeContents \
    > anserini/logs/log.core17 &
  1. Install the required Python packages:
pip install -r requirements.txt

Query simulation

In order to prepare the language models and simulate the queries, the scripts have to executed in the order shown in the following table. All of the outputs can be found in the data/ directory. For the sake of better code readability the names of the query reformulation strategies have been mapped: S1S1; S2S2; S2'S3; S3S4; S3'S5; S4S6; S4'S7; S4''S8. The names of the scripts and output files comply with this name mapping.

Script Description Output files
sim/make_background.py Make the background language model form all index terms of Core17. The background model is required for Controlled Query Generation (CQG) by Jordan et al. data/lm/background.csv
sim/make_cqg.py Make the CQG language models with different parameters of lambda from 0.0 to 1.0. data/lm/cqg.json
sim/simulate_queries_s12345.py Simulate TTS and KIS queries with strategies S1 to S3' data/queries/s12345.csv
sim/simulate_queries_s678.py Simulate TTS and KIS queries with strategies S4 to S4'' data/queries/s678.csv

Experimental evaluation and results

In order to reproduce the experiments of the study, the scripts have to executed in the order shown in the following table.

Script Description Output files Reproduction of ...
eval/arp.py, eval/arp_first.py, eval/arp_max.py Retrieval performance: Evaluate the Average Retrieval Performance (ARP). data/experimental_results/arp.csv, data/experimental_results/arp_first.csv, data/experimental_results/arp_max.csv Tab. A.1
eval/rmse_s12345.py, eval/rmse_s678.py Retrieval performance: Evaluate the Root-Mean-Square-Error (RMSE). data/experimental_results/rmse_map.csv, data/experimental_results/rmse_ndcg.csv, data/experimental_results/rmse_p1000.csv, data/experimental_results/rmse_uqv_vs_s12345_kis_ndcg.csv, data/experimental_results/rmse_uqv_vs_s12345_tts_ndcg.csv, data/figures/rmse_map.pdf, data/figures/rmse_ndcg.pdf, data/figures/rmse_p1000.pdf, data/figures/rmse_uqv_vs_s12345_kis_ndcg.pdf, data/figures/rmse_uqv_vs_s12345_tts_ndcg.pdf Fig. A.1, Fig. 1
eval/t-test.py Retrieval performance: Evaluate the p-values of paired t-tests. data/experimental_results/ttest.csv, data/figures/ttest.pdf Fig. A.2
eval/system_orderings.py Shared task utility: Evaluate Kendall's tau between relative system orderings. data/experimental_results/system_orderings.csv, data/figures/system_orderings.pdf Fig. 2 (left)
eval/sdcg.py Effort and effect: Evaluate the Session Discounted Cumulative Gain (sDCG). data/experimental_results/sdcg_3queries.csv, data/experimental_results/sdcg_5queries.csv, data/experimental_results/sdcg_10queries.csv, data/figures/sdcg_3queries.pdf, data/figures/sdcg_5queries.pdf, data/figures/sdcg_10queries.pdf Fig. 3 (top)
eval/economic.py Effort and effect: Evaluate tradeoffs between number of queries and browsing depth by isoquants. data/experimental_results/economic0.3.csv, data/experimental_results/economic0.4.csv, data/experimental_results/economic0.5.csv, data/figures/economic0.3.pdf, data/figures/economic0.4.pdf, data/figures/economic0.5.pdf Fig. 3 (bottom)
eval/jaccard_similarity.py Query term similarity: Evaluate query term similarities. data/experimental_results/jacc.csv, data/figures/jacc.pdf Fig. 2 (right)
Owner
IR Group at Technische Hochschule Köln
IR Group at Technische Hochschule Köln
Parametric Contrastive Learning (ICCV2021)

Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or

DV Lab 156 Dec 21, 2022
Hierarchical Few-Shot Generative Models

Hierarchical Few-Shot Generative Models Giorgio Giannone, Ole Winther This repo contains code and experiments for the paper Hierarchical Few-Shot Gene

Giorgio Giannone 6 Dec 12, 2022
[v1 (ISBI'21) + v2] MedMNIST: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification

MedMNIST Project (Website) | Dataset (Zenodo) | Paper (arXiv) | MedMNIST v1 (ISBI'21) Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bili

683 Dec 28, 2022
Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

Densely Connected Convolutional Networks (DenseNets) This repository contains the code for DenseNet introduced in the following paper Densely Connecte

Zhuang Liu 4.5k Jan 03, 2023
Dense Prediction Transformers

Vision Transformers for Dense Prediction This repository contains code and models for our paper: Vision Transformers for Dense Prediction René Ranftl,

Intel ISL (Intel Intelligent Systems Lab) 1.3k Dec 28, 2022
Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

Accelerating Reinforcement Learning with Learned Skill Priors [Project Website] [Paper] Karl Pertsch1, Youngwoon Lee1, Joseph Lim1 1CLVR Lab, Universi

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 134 Dec 06, 2022
Aggragrating Nested Transformer Official Jax Implementation

NesT is a simple method, which aggragrates nested local transformers on image blocks. The idea makes vision transformers attain better accuracy, data efficiency, and convergence on the ImageNet bench

Google Research 169 Dec 20, 2022
PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

PyMatting 1.4k Dec 30, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
Fast Soft Color Segmentation

Fast Soft Color Segmentation

3 Oct 29, 2022
Visual Tracking by TridenAlign and Context Embedding

Visual Tracking by TridentAlign and Context Embedding (TACT) Test code for "Visual Tracking by TridentAlign and Context Embedding" Janghoon Choi, Juns

Janghoon Choi 32 Aug 25, 2021
Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21 For more information, check out the paper on [arXiv]. Training with different

Sunghwan Hong 120 Jan 04, 2023
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 04, 2022
Aspect-Sentiment-Multiple-Opinion Triplet Extraction (NLPCC 2021)

The code and data for the paper "Aspect-Sentiment-Multiple-Opinion Triplet Extraction" Requirements Python 3.6.8 torch==1.2.0 pytorch-transformers==1.

慢半拍 5 Jul 02, 2022
Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

Muhammad Maaz 206 Jan 04, 2023
Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

SymEngine Python Wrappers Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library. Installation Pip See License section

136 Dec 28, 2022
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

NBFNet: Neural Bellman-Ford Networks This is the official codebase of the paper Neural Bellman-Ford Networks: A General Graph Neural Network Framework

MilaGraph 136 Dec 21, 2022
Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

Jennefer Maldonado 1 Dec 28, 2021
Catbird is an open source paraphrase generation toolkit based on PyTorch.

Catbird is an open source paraphrase generation toolkit based on PyTorch. Quick Start Requirements and Installation The project is based on PyTorch 1.

Afonso Salgado de Sousa 5 Dec 15, 2022
Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs This repository contains PyTorch implementation of our pa

Shizhe Chen 178 Dec 29, 2022