Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Last update: Nov 22, 2021

Overview

Git repositoty of the manuscript entitled

Statistical quantification of confounding bias in predictive modelling

by Tamas Spisak

The manuscript describes and validates the package mlconfound.

Read the docs. .

Abstract

The lack of non-parametric statistical tests for confounding bias significantly hampers the development of robust, valid and generalizable predictive models in many fields of research. Here I propose the partial and full confounder tests, which, for a given confounder variable, probe the null hypotheses of unconfounded and fully confounded models, respectively.

The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions, often seen in machine learning. Applying the proposed tests on models trained on functional brain connectivity data from the Human Connectome Project and the Autism Brain Imaging Data Exchange dataset reveals confounders that were previously unreported or found to be hard to correct for with state-of-the-art confound mitigation approaches.

The tests (implemented in the package mlconfound can aid the assessment and improvement of the generalizability and neurobiological validity of predictive models and, thereby, foster the development of clinically useful machine learning biomarkers.

This repository contains:

The latex source of the manuscript describing the 'mlconfound' approach: see manuscript.tex and related files.
Sll source code required to reproduce the results in the manuscript. See the directories: simulated and empirical.
All results. See the directories simulated/results and the analysis notebooks.
All figures. See the directory fig.

To reproduce the whole analysis:

./reproduce.sh

Citation

T. Spisak, Statistical quantification of confounding bias in predictive modelling, preprint on arXiv:2111.00814, 2021.

Licensing

Manuscript source and figures (contents of the root folder and the fig dir): CC BY
Source code (contents of the empirical and simulated folders): GPL3

Acknowledgements

The manuscript builds on an aesthetic and simple LaTeX style suitable for "preprint" publications such as arXiv and bio-arXiv, etc. It is based on the nips_2018.sty style.

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

12 May 30, 2022

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual VQA (CF-VQA) This repository is the Pytorch implementation of our paper "Counterfactual VQA: A Cause-Effect Look at Language Bias" in C

94 Dec 3, 2022

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Pytorch-MBNet A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK Training To train a new model, please ru

46 Dec 28, 2022

Submission to Twitter's algorithmic bias bounty challenge

Twitter Ethics Challenge: Pixel Perfect Submission to Twitter's algorithmic bias bounty challenge, by Travis Hoppe (@metasemantic). Abstract We build

4 Aug 19, 2022

Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

18 Nov 18, 2022

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

7 Jan 5, 2023

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022

Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

3k Dec 29, 2022

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

bulbea "Deep Learning based Python Library for Stock Market Prediction and Modelling." Table of Contents Installation Usage Documentation Dependencies

1.8k Jan 5, 2023

Releases(revision-1.1.0)

revision-1.1.0(Jul 7, 2022)

T. Spisak, Statistical quantification of confounding bias in predictive modelling, preprint on arXiv:2111.00814, 2021.

Manuscript attached. Related package: https://mlconfound.readthedocs.io

Full Changelog: https://github.com/pni-lab/mlconfound-manuscript/compare/preprint-1.0.1...revision-1.1.0
Source code(tar.gz)
Source code(zip)
preprint-1.0.1(Nov 1, 2021)

T. Spisak, Statistical quantification of confounding bias in predictive modelling, preprint on arXiv:2111.00814, 2021.

Manuscript attached. Related package: https://mlconfound.readthedocs.io

Full Changelog: https://github.com/pni-lab/mlconfound-manuscript/compare/submit1-1.0.0...preprint-1.0.1
Source code(tar.gz)
Source code(zip)
mlconfound-arxiv.pdf(3.35 MB)
submit1-1.0.0(Oct 31, 2021)

Manuscript attached. Related package: https://mlconfound.readthedocs.io

Full Changelog: https://github.com/pni-lab/mlconfound-manuscript/compare/preprint-1.0.0...submit1-1.0.0
Source code(tar.gz)
Source code(zip)
mlconfound-submit.pdf(3.37 MB)
preprint-1.0.0(Oct 30, 2021)

T. Spisak, Statistical quantification of confounding bias in predictive modelling, a preprint, 2021.

Manuscript attached. Related package: https://mlconfound.readthedocs.io
Source code(tar.gz)
Source code(zip)
mlconfound-arxiv.pdf(3.35 MB)

Owner

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany

GitHub Repository https://mlconfound.readthedocs.io

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Planar Surface Reconstruction From Sparse Views Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey University of Michigan ICCV 2021 (Oral) This re

89 Jan 05, 2023

A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

1.6k Jan 06, 2023

This repository contains tutorials for the py4DSTEM Python package

py4DSTEM Tutorials This repository contains tutorials for the py4DSTEM Python package. For more information about py4DSTEM, including installation ins

11 Dec 23, 2022

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

27 Oct 23, 2022

Pre-trained Deep Learning models and demos (high quality and extremely fast)

OpenVINO™ Toolkit - Open Model Zoo repository This repository includes optimized deep learning models and a set of demos to expedite development of hi

3.4k Dec 31, 2022

PyTorch implementation of ENet

PyTorch-ENet PyTorch (v1.1.0) implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from the lua-torc

333 Dec 29, 2022

Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Convolutional Two-Stream Network Fusion for Video Action Recognition

676 Dec 31, 2022

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

209 Dec 29, 2022

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

COResets and Data Subset selection Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order

244 Jan 09, 2023

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

201 Nov 21, 2022

Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

PairRE Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. This implementation of PairRE for Open Graph Benchmak datasets (

65 Dec 19, 2022

Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Related tags

Overview

Git repositoty of the manuscript entitled

Statistical quantification of confounding bias in predictive modelling

Abstract

This repository contains:

To reproduce the whole analysis:

Citation

Licensing

Acknowledgements

You might also like...

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Submission to Twitter's algorithmic bias bounty challenge

Repository for the Bias Benchmark for QA dataset.

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

Fast, flexible and easy to use probabilistic modelling in Python.

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

Releases(revision-1.1.0)

revision-1.1.0(Jul 7, 2022)

preprint-1.0.1(Nov 1, 2021)

submit1-1.0.0(Oct 31, 2021)

preprint-1.0.0(Oct 30, 2021)

Owner

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

A PyTorch Toolbox for Face Recognition

This repository contains tutorials for the py4DSTEM Python package

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Pre-trained Deep Learning models and demos (high quality and extremely fast)

PyTorch implementation of ENet

Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

An Open Source Machine Learning Framework for Everyone

Polynomial-time Meta-Interpretive Learning

A simple Neural Network that predicts the label for a series of handwritten digits

Keyword-BERT: Keyword-Attentive Deep Semantic Matching

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

A lightweight python AUTOmatic-arRAY library.

Automatically replace ONNX's RandomNormal node with Constant node.

Genetic Programming in Python, with a scikit-learn inspired API

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.