A Japanese Medical Information Extraction Toolkit

Last update: Dec 12, 2022

Related tags

Deep Learning JaMIE

Overview

JaMIE: a Japanese Medical Information Extraction toolkit

Joint Japanese Medical Problem, Modality and Relation Recognition

The Train/Test phrases require all train, dev, test file converted to CONLL-style. Please check data_converter.py

Installation (python3.8)

git clone https://github.com/racerandom/JaMIE.git
cd JaMIE \

Required python package

pip install -r requirements.txt

Mophological analyzer required:\

jumanpp
mecab (juman-dict)

Pretrained BERT required:\

NICT-BERT (NICT_BERT-base_JapaneseWikipedia_32K_BPE)

Train：

CUDA_VISIBLE_DEVICES=$SEED python clinical_joint.py \
--pretrained_model $PRETRAINED_BERT \
--train_file $TRAIN_FILE \
--dev_file $DEV_FILE \
--dev_output $DEV_OUT \
--saved_model $MODEL_DIR_TO_SAVE \
--enc_lr 2e-5 \
--batch_size 4 \
--warmup_epoch 2 \
--num_epoch 20 \
--do_train
--fp16 (apex required)

The models trained on radiography interpretation reports of Lung Cancer (LC) and general medical reports of Idiopathic Pulmonary Fibrosis (IPF) are to be availabel: link1, link2.

Test:

CUDA_VISIBLE_DEVICES=$SEED python clinical_joint.py \
--saved_model $SAVED_MODEL \
--test_file $TEST_FILE \
--test_output $TEST_OUT \
--batch_size 4

Bath Converter from XML (or raw text) to CONLL for Train/Test

Convert XML files to CONLL files for Train/Test. You can also convert raw text to CONLL-style for Test.

python data_converter.py \
--mode xml2conll \
--xml $XML_FILES_DIR \
--conll $OUTPUT_CONLL_DIR \
--cv_num 5 \ # 5-fold cross-validation, 0 presents to generate single conll file
--doc_level \ # generate document-level ([SEP] denotes sentence boundaries) or sentence-level conll files
--segmenter mecab \ # please use mecab and NICT bert currently
--bert_dir $PRETRAINED_BERT

Batch Converter from predicted CONLL to XML

python data_converter.py \
--mode conll2xml \
--xml $XML_FILES_DIR \
--conll $OUTPUT_CONLL_DIR

Citation

If you use our code in your research, please cite our work:

@inproceedings{cheng2021jamie,
   title={JaMIE: A Pipeline Japanese Medical Information Extraction System,
   author={Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji Aramaki, Sadao Kurohashi},
   booktitle={arXiv},
   year={2021}
}

A Japanese Medical Information Extraction Toolkit

Related tags

Overview

JaMIE: a Japanese Medical Information Extraction toolkit

Joint Japanese Medical Problem, Modality and Relation Recognition

Installation (python3.8)

Required python package

Mophological analyzer required:\

Pretrained BERT required:\

Train：

Test:

Bath Converter from XML (or raw text) to CONLL for Train/Test

Batch Converter from predicted CONLL to XML

Citation

Owner

Robotics environments

Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.

Notes taking website build with Docker + Django + React.

Pyramid Scene Parsing Network, CVPR2017.

Evaluating Cross-lingual Sentence Representations

External Attention Network

Train Dense Passage Retriever (DPR) with a single GPU

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Implementation of neural class expression synthesizers

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Repo for 2021 SDD assessment task 2, by Felix, Anna, and James.

Generating Fractals on Starknet with Cairo

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

End-to-end image segmentation kit based on PaddlePaddle.

Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

Learning to Reconstruct 3D Manhattan Wireframes from a Single Image

Point cloud processing tool library.

Video Matting Refinement For Python

Simple and Distributed Machine Learning