Bert Axioms

This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics

Required Data

In order to run this code, you first need to download the dataset from the TREC 2019 Deep Learning Track Guidelines. The path for these should be specified in the config file

You also need a working installation of the Indri Toolkit for indexing and retrieval.

Parameters

There are a number of hyperparemeter that need to be set (like indri path, number of candidates to be retrieved, random seed etc). These can be set on a config YAML file at scripts/config-defaults.yaml. The parameters are handled by wandb, but can easily be addapted for any YAML reader (take a look at PyYAML.)

Observations

Note that, for LNC2, we use an external C++ code for dealing with Indri. This is so we can add the duplicated documents to the index without comprimissing scores. This code should be compiled with Indri's Makefile.app. This should be as easy as edditing Makefile.app from Indri and running make -f Makefile.app. (Check https://lemur.sourceforge.io/indri/ for more details).

The removal process of documents from the indri index does not guarantee that the index statistics will change immediately. This can cause slight differences than the more "correct" way to re-create the index from scratch for every duplicated document.

Expected Results

The results from this repository may not directly replicate the ones that appear on the paper. This is due to a few performance improvements made after the paper submission. These, however, do not change the final scores and conclusions. Mostly, you may see a increase on alpha-nDCG for all methods, and a increase on QL performance accross the board.

	`nDCG_cut`	`TFCI`	`TFCII`	`MTDC`	`LNC1`	`LNC2`	`TP`	`STMC1`	`STMC2`	`STMC3`
QL	0.3633	0.9936	0.7008	0.8759	0.5021	1.000	0.3852	0.4855	0.7047	0.7011
DistilBERT	0.4537	0.6109	0.3945	0.5130	0.5006	0.0003	0.4105	0.5040	0.5120	0.5099

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Related tags

Overview

Bert Axioms

Required Data

Parameters

Observations

Expected Results

Owner

Arthur Câmara

Machine Learning Platform for Kubernetes

Robocop is your personal mini voice assistant made using Python.

Hummingbird compiles trained ML models into tensor computation for faster inference.

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

The project of phase's key role in complex and real NN

Short and long time series classification using convolutional neural networks

Intro-to-dl - Resources for "Introduction to Deep Learning" course.

Differentiable Annealed Importance Sampling (DAIS)

Train a deep learning net with OpenStreetMap features and satellite imagery.

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

CarND-LaneLines-P1 - Lane Finding Project for Self-Driving Car ND

TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Nest Protect integration for Home Assistant. This will allow you to integrate your smoke, heat, co and occupancy status real-time in HA.

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

PyTorch Implement of Context Encoders: Feature Learning by Inpainting