Behavioral Testing of Clinical NLP Models

This repository contains code for testing the behavior of clinical prediction models based on patient letters. For a detailed description of the testing framework see our paper What Do You See in this Patient? Behavioral Testing of Clinical NLP Models.

Usage

Install requirements: pip install -r requirements.txt

Run main.py, e.g. for diagnosis prediction test on gender, age and ethnicity:

python main.py 
    --test_set_path ./path_to_test_set
    --model_path bvanaken/CORe-clinical-diagnosis-prediction
    --task diagnosis
    --shift_keys gender,age,ethnicity
    --save_dir ./results
    --gpu False

Parameter	Description
test_set_path	Path to original test set file
model_path	Path to model or Huggingface model hub checkpoint
task	Current options: diagnosis, mortality
shift_keys	Which patient characteristics to test. Current options: age, gender, ethnicity, weight, intersectional (gender + ethnicity)
save_dir	Directory to save results, default: "./results"
gpu	Whether to use a gpu during inference or not, default: False

Using Non-Transformer models

The framework currently focuses on testing Transformer-based models. However, it is easy to extend it to any other prediction model. To do so, simply create a new class implementing the Predictor interface and add it to the TASK_MAP in main.py.

Cite

@inproceedings{vanAken2021,
  author    = {Betty van Aken and
               Sebastian Herrmann and
               Alexander Löser},
  title     = {What Do You See in this Patient? Behavioral Testing of Clinical NLP Models},
  booktitle = {Bridging the Gap: From Machine Learning Research to Clinical Practice, 
               Research2Clinics Workshop @ NeurIPS 2021},
  year      = {2021}
}

Behavioral Testing of Clinical NLP Models

Related tags

Overview

Behavioral Testing of Clinical NLP Models

Usage

Using Non-Transformer models

Cite

Owner

Betty van Aken

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Text vectorization tool to outperform TFIDF for classification tasks

Refactored version of FastSpeech2

This program do translate english words to portuguese

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Text Classification Using LSTM

Code for text augmentation method leveraging large-scale language models

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

PG-19 Language Modelling Benchmark

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

This repository contains the code for "Generating Datasets with Pretrained Language Models".

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

a chinese segment base on crf

CoSENT、STS、SentenceBERT

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

CATs: Semantic Correspondence with Transformers

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

Machine learning classifiers to predict American Sign Language .