Toward Model Interpretability in Medical NLP

LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim ([email protected]), December 2021

Code Organization

data: contains medical report data [LINK TO THAT REPO] used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation

models: checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization

notebooks:

model_training.ipynb: code to train and fine-tune BERT and BioBERT

model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses

scripts: same functionality as in the notebooks, in executable python scripts / functions

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook: [https://colab.research.google.com/drive/1uPIi-OVchs_8A-SNcQtLfwelr0ccsz19?usp=sharing] Model evaluation/analysis notebook: [https://colab.research.google.com/drive/1Hfy58JvyPbx55lKKhQAzzrhJIbN_Io0j?usp=sharing]

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks as is, changing directory paths when needed.

Toward Model Interpretability in Medical NLP

Related tags

Overview

Toward Model Interpretability in Medical NLP

Code Organization

Dependencies

How to run code

Option 1) Google Colab

Option 2) Local Machine

Owner

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

A paper list of pre-trained language models (PLMs).

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

An open source library for deep learning end-to-end dialog systems and chatbots.

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Subtitle Workshop (subshop): tools to download and synchronize subtitles

:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

ADCS cert template modification and ACL enumeration

Rhyme with AI

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

CATs: Semantic Correspondence with Transformers

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

An open-source NLP research library, built on PyTorch.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

A website which allows you to play with the GPT-2 transformer

Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

Pretrained Japanese BERT models

Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.