CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

Last update: Dec 28, 2022

Related tags

Deep Learning cvt2distilgpt2

Overview

CvT2DistilGPT2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

This repository houses the implementation of CvT2DistilGPT2 from [1].
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.
Checkpoints for CvT2DistilGPT2 on MIMIC-CXR and IU X-Ray are available.
This implementation could be adapted for any image captioning task by modifying the datamodule.


CvT2DistilGPT2 for MIMIC-CXR. Q, K, and V are the queries, keys, and values, respectively, for multi-head attention. * indicates that the linear layers for Q, K, and V are replaced with the convolutional layers depicted below the multi-head attention module. `[BOS]` is the beginning-of-sentence special token. `N_l` is the number of layers for each stage, where `N_l=1`, `N_l=4`, and `N_l=16` for the first, second, and third stage, respectively. The head for DistilGPT2 is the same used for language modelling. Subwords produced by DistilGPT2 are separated by a vertical bar.

CvT2DistilGPT2 for MIMIC-CXR. Q, K, and V are the queries, keys, and values, respectively, for multi-head attention. * indicates that the linear layers for Q, K, and V are replaced with the convolutional layers depicted below the multi-head attention module. [BOS] is the beginning-of-sentence special token. N_l is the number of layers for each stage, where N_l=1, N_l=4, and N_l=16 for the first, second, and third stage, respectively. The head for DistilGPT2 is the same used for language modelling. Subwords produced by DistilGPT2 are separated by a vertical bar.

Installation

The required packages are located in requirements.txt. It is recommended that these are installed in a virtualenv:

python3 -m venv --system-site-packages venv
source venv/bin/activate
pip install --upgrade pip
pip install --upgrade -r requirements.txt --no-cache-dir

Datasets

For MIMIC-CXR:

Download MIMIC-CXR-JPG from:

https://physionet.org/content/mimic-cxr-jpg/2.0.0/

Place in dataset/mimic_cxr_jpg such that dataset/mimic_cxr_jpg/physionet.org/files/mimic-cxr-jpg/2.0.0/files.

Download the Chen et al. labels for MIMIC-CXR from:

https://drive.google.com/file/d/1DS6NYirOXQf8qYieSVMvqNwuOlgAbM_E/view?usp=sharing

Place annotations.json in dataset/mimic_cxr_chen

For IU X-Ray:

Download the Chen et al. labels and the chest X-rays in png format for IU X-Ray from:
```
https://drive.google.com/file/d/1c0BXEuDy8Cmm2jfN0YYGkQxFZd2ZIoLg/view
```
Place files into dataset/iu_x-ray_chen such that dataset/iu_x-ray_chen/annotations.json and dataset/iu_x-ray_chen/images.

#####Note: the dataset directory can be changed for each task with the variable dataset_dir in task/mimic_cxr_jpg_chen/paths.yaml and task/mimic_cxr_jpg_chen/paths.yaml

Checkpoints

The checkpoints for MIMIC-CXR and IU X-Ray can be found at (the download link is located at the top right): https://doi.org/10.25919/hbqx-2p71. Place the checkpoints in the experiment directory for each version of each task, e.g., experiment/mimic_cxr_jpg_chen/cvt_21_to_gpt2_scst/epoch=0-val_chen_cider=0.410965.ckpt #####Note: the experiment directory can be changed for each task with the variable exp_dir in task/mimic_cxr_jpg_chen/paths.yaml and task/mimic_cxr_jpg_chen/paths.yaml

Instructions

The model configurations for each task can be found in its config directory, e.g. task/mimic_cxr_jpg_chen/config.
A job for a model is described in the tasks jobs.yaml file, e.g. task/mimic_cxr_jpg_chen/jobs.yaml.

To test the CvT2DistilGPT2 + SCST checkpoint, set task/mimic_cxr_jpg_chen/jobs.yaml to (default):

cvt_21_to_distilgpt2_scst:
    train: 0
    test: 1
    debug: 0
    num_nodes: 1
    num_gpus: 1
    num_workers: 5

To train CvT2DistilGPT2 with teacher forcing and then test, set task/mimic_cxr_jpg_chen/jobs.yaml to:

cvt_21_to_distilgpt2:
    train: 1
    test: 1
    debug: 0
    num_nodes: 1
    num_gpus: 1
    num_workers: 5

or with Slurm:

cvt_21_to_distilgpt2:
    train: 1
    test: 1
    debug: 0
    num_nodes: 1
    num_gpus: 1
    num_workers: 5
    resumable: 1
    sbatch: 1
    time_limit: 1-00:00:00

To run the job:

python3 main.py --task mimic_cxr_jpg_chen

#####Note: data from the job will be saved in the experiment directory.

Reference

[1] Aaron Nicolson, Jason Dowling, and Aaron Nicolson, Improving Chest X-Ray Report Generation by Leveraging Warm-Starting, Under review (January 2022)

CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

Related tags

Overview

CvT2DistilGPT2

Improving Chest X-Ray Report Generation by Leveraging Warm-Starting

Installation

Datasets

For MIMIC-CXR:

For IU X-Ray:

Checkpoints

Instructions

Reference

Owner

The Australian e-Health Research Centre

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Facial recognition project

This is an official implementation for "PlaneRecNet".

Joint project of the duo Hacker Ninjas

This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

Measuring and Improving Consistency in Pretrained Language Models

Semantic Image Synthesis with SPADE

This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

MoveNet Single Pose on OpenVINO

This is the repository for paper NEEDLE: Towards Non-invertible Backdoor Attack to Deep Learning Models.

Code for Paper Predicting Osteoarthritis Progression via Unsupervised Adversarial Representation Learning

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

Learning Tracking Representations via Dual-Branch Fully Transformer Networks

ColossalAI-Benchmark - Performance benchmarking with ColossalAI

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Posterior predictive distributions quantify uncertainties ignored by point estimates.