Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Overview

Clairvoyance: A Pipeline Toolkit for Medical Time Series


Authors: van der Schaar Lab

This repository contains implementations of Clairvoyance: A Pipeline Toolkit for Medical Time Series for the following applications.

  • Time-series prediction (one-shot and online)
  • Transfer learning
  • Individualized time-series treatment effects (ITE) estimation
  • Active sensing on time-series data
  • AutoML

All API files for those applications can be found in /api folder. All tutorials for those applications can be found in /tutorial folder.

Block diagram of Clairvoyance

Installation

There are currently two ways of installing the required dependencies: using Docker or using Conda.

Note on Requirements

  • Clairvoyance has been tested on Ubuntu 20.04, but should be broadly compatible with common Linux systems.
  • The Docker installation method is additionally compatible with Mac and Windows systems that support Docker.
  • Hardware requirements depends on the underlying ML models used, but a machine that can handle ML research tasks is recommended.
  • For faster computation, CUDA-capable Nvidia card is recommended (follow the CUDA-enabled installation steps below).

Docker installation

  1. Install Docker on your system: https://docs.docker.com/get-docker/.
  2. [Required for CUDA-enabled installation only] Install Nvidia container runtime: https://github.com/NVIDIA/nvidia-container-runtime/.
    • Assumes Nvidia drivers are correctly installed on your system.
  3. Get the latest Clairvoyance Docker image:
    $ docker pull clairvoyancedocker/clv:latest
  4. To run the Docker container as a terminal, execute the below from the Clairvoyance repository root:
    $ docker run -i -t --gpus all --network host -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data clairvoyancedocker/clv
    • Explanation of the docker run arguments:
      • -i -t: Run a terminal session.
      • --gpus all: [Required for CUDA-enabled installation only], passes your GPU(s) to the Docker container, otherwise skip this option.
      • --network host: Use your machine's network and forward ports. Could alternatively publish ports, e.g. -p 8888:8888.
      • -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data: Share directory/ies with the Docker container as volumes, e.g. data.
      • clairvoyancedocker/clv: Specifies Clairvoyance Docker image.
    • If using Windows:
      • Use PowerShell and first run the command $pwdwin = $(pwd).Path. Then use $pwdwin instead of $(pwd) in the docker run command.
    • If using Windows or Mac:
      • Due to how Docker networking works, replace --network host with -p 8888:8888.
  5. Run all following Clairvoyance API commands, jupyter notebooks etc. from within this Docker container.

Conda installation

Conda installation has been tested on Ubuntu 20.04 only.

  1. From the Clairvoyance repo root, execute:
    $ conda env create --name clvenv -f ./environment.yml
    $ conda activate clvenv
  2. Run all following Clairvoyance API commands, jupyter notebooks etc. in the clvenv environment.

Data

Clairvoyance expects your dataset files to be defined as follows:

  • Four CSV files (may be compressed), as illustrated below:
    static_test_data.csv
    static_train_data.csv
    temporal_test_data.csv
    temporal_train_data.csv
    
  • Static data file content format:
    id,my_feature,my_other_feature,my_third_feature_etc
    3wOSm2,11.00,4,-1.0
    82HJss,3.40,2,2.1
    iX3fiP,7.01,3,-0.4
    ...
    
  • Temporal data file content format:
    id,time,variable,value
    3wOSm2,0.0,my_first_temporal_feature,0.45
    3wOSm2,0.5,my_first_temporal_feature,0.47
    3wOSm2,1.2,my_first_temporal_feature,0.49
    3wOSm2,0.0,my_second_temporal_feature,10.0
    3wOSm2,0.1,my_second_temporal_feature,12.4
    3wOSm2,0.3,my_second_temporal_feature,9.3
    82HJss,0.0,my_first_temporal_feature,0.22
    82HJss,1.0,my_first_temporal_feature,0.44
    ...
    
  • The id column is required in the static data files. The id,time,variable,value columns are required in the temporal file. The IDs of samples must match between the static and temporal files.
  • Your data files are expected to be under:
    <clairvoyance_repo_root>/datasets/data/<your_dataset_name>/
    
  • See tutorials for how to define your dataset(s) in code.
  • Clairvoyance examples make reference to some existing datasets, e.g. mimic, ward. These are confidential datasets (or in case of MIMIC-III, it requires a training course and an access request) and are not provided here. Contact [email protected] for more details.

Extract data from MIMIC-III

To use MIMIC-III with Clairvoyance, you need to get access to MIMIC-III and follow the instructions for installing it in a Postgres database: https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/

$ cd datasets/mimic_data_extraction && python extract_antibiotics_dataset.py

Usage

  • To run tutorials:
    • Launch jupyter lab: $ jupyter-lab.
      • If using Windows or Mac and following the Docker installation method, run jupyter-lab --ip="0.0.0.0".
    • Open jupyter lab in the browser by following the URL with the token.
    • Navigate to tutorial/ and run a tutorial of your choice.
  • To run Clairvoyance API from the command line, execute the appropriate command from within the Docker terminal (see example command below).

Example: Time-series prediction

To run the pipeline for training and evaluation on time-series prediction framework, simply run $ python -m api/main_api_prediction.py or take a look at the jupyter notebook tutorial/tutorial_prediction.ipynb.

Note that any model architecture can be used as the predictor model such as RNN, Temporal convolutions, and transformer. The condition for predictor model is to have fit and predict functions as its subfunctions.

  • Stages of the time-series prediction:

    • Import dataset
    • Preprocess data
    • Define the problem (feature, label, etc.)
    • Impute missing components
    • Select the relevant features
    • Train time-series predictive model
    • Estimate the uncertainty of the predictions
    • Interpret the predictions
    • Evaluate the time-series prediction performance on the testing set
    • Visualize the outputs (performance, predictions, uncertainties, and interpretations)
  • Command inputs:

    • data_name: mimic, ward, cf
    • normalization: minmax, standard, None
    • one_hot_encoding: input features that need to be one-hot encoded
    • problem: one-shot or online
    • max_seq_len: maximum sequence length after padding
    • label_name: the column name for the label(s)
    • treatment: the column name for treatments
    • static_imputation_model: mean, median, mice, missforest, knn, gain
    • temporal_imputation_model: mean, median, linear, quadratic, cubic, spline, mrnn, tgain
    • feature_selection_model: greedy-addition, greedy-deletion, recursive-addition, recursive-deletion, None
    • feature_number: selected feature number
    • model_name: rnn, gru, lstm, attention, tcn, transformer
    • h_dim: hidden dimensions
    • n_layer: layer number
    • n_head: head number (only for transformer model)
    • batch_size: number of samples in mini-batch
    • epochs: number of epochs
    • learning_rate: learning rate
    • static_mode: how to utilize static features (concatenate or None)
    • time_mode: how to utilize time information (concatenate or None)
    • task: classification or regression
    • uncertainty_model_name: uncertainty estimation model name (ensemble)
    • interpretation_model_name: interpretation model name (tinvase)
    • metric_name: auc, apr, mae, mse
  • Example command:

    $ cd api
    $ python main_api_prediction.py \
        --data_name cf --normalization minmax --one_hot_encoding admission_type \
        --problem one-shot --max_seq_len 24 --label_name death \
        --static_imputation_model median --temporal_imputation_model median \
        --model_name lstm --h_dim 100 --n_layer 2 --n_head 2 --batch_size 400 \
        --epochs 20 --learning_rate 0.001 \
        --static_mode concatenate --time_mode concatenate \
        --task classification --uncertainty_model_name ensemble \
        --interpretation_model_name tinvase --metric_name auc
  • Outputs:

    • Model prediction
    • Model performance
    • Prediction uncertainty
    • Prediction interpretation

Citation

To cite Clairvoyance in your publications, please use the following reference.

Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, and Mihaela van der Schaar (2021). Clairvoyance: A Pipeline Toolkit for Medical Time Series. In International Conference on Learning Representations. Available at: https://openreview.net/forum?id=xnC8YwKUE3k.

You can also use the following Bibtex entry.

@inproceedings{
  jarrett2021clairvoyance,
  title={Clairvoyance: A Pipeline Toolkit for Medical Time Series},
  author={Daniel Jarrett and Jinsung Yoon and Ioana Bica and Zhaozhi Qian and Ari Ercole and Mihaela van der Schaar},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=xnC8YwKUE3k}
}

To cite the Clairvoyance alpha blog post, please use:

van Der Schaar, M., Yoon, J., Qian, Z., Jarrett, D., & Bica, I. (2020). clairvoyance alpha: the first pipeline toolkit for medical time series. [Webpages]. https://doi.org/10.17863/CAM.70020

@misc{https://doi.org/10.17863/cam.70020,
  doi = {10.17863/CAM.70020},
  url = {https://www.repository.cam.ac.uk/handle/1810/322563},
  author = {Van Der Schaar,  Mihaela and Yoon,  Jinsung and Qian,  Zhaozhi and Jarrett,  Dan and Bica,  Ioana},
  title = {clairvoyance alpha: the first pipeline toolkit for medical time series},
  publisher = {Apollo - University of Cambridge Repository},
  year = {2020}
}
Owner
van_der_Schaar \LAB
We are creating cutting-edge machine learning methods and applying them to drive a revolution in healthcare.
van_der_Schaar \LAB
This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation This is the code relat

39 Sep 23, 2022
Tutorial on scikit-learn and IPython for parallel machine learning

Parallel Machine Learning with scikit-learn and IPython Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearra

Olivier Grisel 1.6k Dec 26, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 33k Dec 28, 2022
[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation This is the official implementation for the method described in Ch

Jiaxing Yan 27 Dec 30, 2022
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

143 Dec 28, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 08, 2023
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

Xavier Tao 14 Dec 17, 2022
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

3 Aug 08, 2021
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

VITON-HD — Official PyTorch Implementation VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization Seunghwan Choi*1, Sunghyun Pa

Seunghwan Choi 250 Jan 06, 2023
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe für unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021
An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

Joseph Lee 7 Dec 25, 2022
Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Token Labeling: Training an 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (arxiv) This is a Pytorch implementation of our te

蒋子航 383 Dec 27, 2022
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

NeuLab 196 Dec 17, 2022
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

78 Dec 27, 2022
An Inverse Kinematics library aiming performance and modularity

IKPy Demo Live demos of what IKPy can do (click on the image below to see the video): Also, a presentation of IKPy: Presentation. Features With IKPy,

Pierre Manceron 481 Jan 02, 2023
Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

Source code for CAST: Crisis Domain Adaptation UsingSequence-to-sequenceTransformers (Paper, BibTeX, Accepted to ISCRAM 2021, CorePaper) Quick start D

Congcong Wang 0 Jul 14, 2021
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022
Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. ArXiv If

International Business Machines 168 Dec 29, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 50 Dec 16, 2022