Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Related tags

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Owner

Medicines Optimisation Service - Austin Health

This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

CoSENT 比Sentence-BERT更有效的句向量方案

Wind Speed Prediction using LSTMs in PyTorch

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

Python package for performing Entity and Text Matching using Deep Learning.

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

This is the source code of RPG (Reward-Randomized Policy Gradient)

keras implement of transformers for humans

A demo of chinese asr

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

A library for end-to-end learning of embedding index and retrieval model

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

code for modular summarization work published in ACL2021 by Krishna et al

🌐 Translation microservice powered by AI

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

PyWorld3 is a Python implementation of the World3 model

A curated list of FOSS tools to improve the Hacker News experience

Code for "Generating Disentangled Arguments with Prompts: a Simple Event Extraction Framework that Works"

Community and sentiment analysis based on tweets