Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Overview

Clairvoyance: A Pipeline Toolkit for Medical Time Series


Authors: van der Schaar Lab

This repository contains implementations of Clairvoyance: A Pipeline Toolkit for Medical Time Series for the following applications.

  • Time-series prediction (one-shot and online)
  • Transfer learning
  • Individualized time-series treatment effects (ITE) estimation
  • Active sensing on time-series data
  • AutoML

All API files for those applications can be found in /api folder. All tutorials for those applications can be found in /tutorial folder.

Block diagram of Clairvoyance

Installation

There are currently two ways of installing the required dependencies: using Docker or using Conda.

Note on Requirements

  • Clairvoyance has been tested on Ubuntu 20.04, but should be broadly compatible with common Linux systems.
  • The Docker installation method is additionally compatible with Mac and Windows systems that support Docker.
  • Hardware requirements depends on the underlying ML models used, but a machine that can handle ML research tasks is recommended.
  • For faster computation, CUDA-capable Nvidia card is recommended (follow the CUDA-enabled installation steps below).

Docker installation

  1. Install Docker on your system: https://docs.docker.com/get-docker/.
  2. [Required for CUDA-enabled installation only] Install Nvidia container runtime: https://github.com/NVIDIA/nvidia-container-runtime/.
    • Assumes Nvidia drivers are correctly installed on your system.
  3. Get the latest Clairvoyance Docker image:
    $ docker pull clairvoyancedocker/clv:latest
  4. To run the Docker container as a terminal, execute the below from the Clairvoyance repository root:
    $ docker run -i -t --gpus all --network host -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data clairvoyancedocker/clv
    • Explanation of the docker run arguments:
      • -i -t: Run a terminal session.
      • --gpus all: [Required for CUDA-enabled installation only], passes your GPU(s) to the Docker container, otherwise skip this option.
      • --network host: Use your machine's network and forward ports. Could alternatively publish ports, e.g. -p 8888:8888.
      • -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data: Share directory/ies with the Docker container as volumes, e.g. data.
      • clairvoyancedocker/clv: Specifies Clairvoyance Docker image.
    • If using Windows:
      • Use PowerShell and first run the command $pwdwin = $(pwd).Path. Then use $pwdwin instead of $(pwd) in the docker run command.
    • If using Windows or Mac:
      • Due to how Docker networking works, replace --network host with -p 8888:8888.
  5. Run all following Clairvoyance API commands, jupyter notebooks etc. from within this Docker container.

Conda installation

Conda installation has been tested on Ubuntu 20.04 only.

  1. From the Clairvoyance repo root, execute:
    $ conda env create --name clvenv -f ./environment.yml
    $ conda activate clvenv
  2. Run all following Clairvoyance API commands, jupyter notebooks etc. in the clvenv environment.

Data

Clairvoyance expects your dataset files to be defined as follows:

  • Four CSV files (may be compressed), as illustrated below:
    static_test_data.csv
    static_train_data.csv
    temporal_test_data.csv
    temporal_train_data.csv
    
  • Static data file content format:
    id,my_feature,my_other_feature,my_third_feature_etc
    3wOSm2,11.00,4,-1.0
    82HJss,3.40,2,2.1
    iX3fiP,7.01,3,-0.4
    ...
    
  • Temporal data file content format:
    id,time,variable,value
    3wOSm2,0.0,my_first_temporal_feature,0.45
    3wOSm2,0.5,my_first_temporal_feature,0.47
    3wOSm2,1.2,my_first_temporal_feature,0.49
    3wOSm2,0.0,my_second_temporal_feature,10.0
    3wOSm2,0.1,my_second_temporal_feature,12.4
    3wOSm2,0.3,my_second_temporal_feature,9.3
    82HJss,0.0,my_first_temporal_feature,0.22
    82HJss,1.0,my_first_temporal_feature,0.44
    ...
    
  • The id column is required in the static data files. The id,time,variable,value columns are required in the temporal file. The IDs of samples must match between the static and temporal files.
  • Your data files are expected to be under:
    <clairvoyance_repo_root>/datasets/data/<your_dataset_name>/
    
  • See tutorials for how to define your dataset(s) in code.
  • Clairvoyance examples make reference to some existing datasets, e.g. mimic, ward. These are confidential datasets (or in case of MIMIC-III, it requires a training course and an access request) and are not provided here. Contact [email protected] for more details.

Extract data from MIMIC-III

To use MIMIC-III with Clairvoyance, you need to get access to MIMIC-III and follow the instructions for installing it in a Postgres database: https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/

$ cd datasets/mimic_data_extraction && python extract_antibiotics_dataset.py

Usage

  • To run tutorials:
    • Launch jupyter lab: $ jupyter-lab.
      • If using Windows or Mac and following the Docker installation method, run jupyter-lab --ip="0.0.0.0".
    • Open jupyter lab in the browser by following the URL with the token.
    • Navigate to tutorial/ and run a tutorial of your choice.
  • To run Clairvoyance API from the command line, execute the appropriate command from within the Docker terminal (see example command below).

Example: Time-series prediction

To run the pipeline for training and evaluation on time-series prediction framework, simply run $ python -m api/main_api_prediction.py or take a look at the jupyter notebook tutorial/tutorial_prediction.ipynb.

Note that any model architecture can be used as the predictor model such as RNN, Temporal convolutions, and transformer. The condition for predictor model is to have fit and predict functions as its subfunctions.

  • Stages of the time-series prediction:

    • Import dataset
    • Preprocess data
    • Define the problem (feature, label, etc.)
    • Impute missing components
    • Select the relevant features
    • Train time-series predictive model
    • Estimate the uncertainty of the predictions
    • Interpret the predictions
    • Evaluate the time-series prediction performance on the testing set
    • Visualize the outputs (performance, predictions, uncertainties, and interpretations)
  • Command inputs:

    • data_name: mimic, ward, cf
    • normalization: minmax, standard, None
    • one_hot_encoding: input features that need to be one-hot encoded
    • problem: one-shot or online
    • max_seq_len: maximum sequence length after padding
    • label_name: the column name for the label(s)
    • treatment: the column name for treatments
    • static_imputation_model: mean, median, mice, missforest, knn, gain
    • temporal_imputation_model: mean, median, linear, quadratic, cubic, spline, mrnn, tgain
    • feature_selection_model: greedy-addition, greedy-deletion, recursive-addition, recursive-deletion, None
    • feature_number: selected feature number
    • model_name: rnn, gru, lstm, attention, tcn, transformer
    • h_dim: hidden dimensions
    • n_layer: layer number
    • n_head: head number (only for transformer model)
    • batch_size: number of samples in mini-batch
    • epochs: number of epochs
    • learning_rate: learning rate
    • static_mode: how to utilize static features (concatenate or None)
    • time_mode: how to utilize time information (concatenate or None)
    • task: classification or regression
    • uncertainty_model_name: uncertainty estimation model name (ensemble)
    • interpretation_model_name: interpretation model name (tinvase)
    • metric_name: auc, apr, mae, mse
  • Example command:

    $ cd api
    $ python main_api_prediction.py \
        --data_name cf --normalization minmax --one_hot_encoding admission_type \
        --problem one-shot --max_seq_len 24 --label_name death \
        --static_imputation_model median --temporal_imputation_model median \
        --model_name lstm --h_dim 100 --n_layer 2 --n_head 2 --batch_size 400 \
        --epochs 20 --learning_rate 0.001 \
        --static_mode concatenate --time_mode concatenate \
        --task classification --uncertainty_model_name ensemble \
        --interpretation_model_name tinvase --metric_name auc
  • Outputs:

    • Model prediction
    • Model performance
    • Prediction uncertainty
    • Prediction interpretation

Citation

To cite Clairvoyance in your publications, please use the following reference.

Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, and Mihaela van der Schaar (2021). Clairvoyance: A Pipeline Toolkit for Medical Time Series. In International Conference on Learning Representations. Available at: https://openreview.net/forum?id=xnC8YwKUE3k.

You can also use the following Bibtex entry.

@inproceedings{
  jarrett2021clairvoyance,
  title={Clairvoyance: A Pipeline Toolkit for Medical Time Series},
  author={Daniel Jarrett and Jinsung Yoon and Ioana Bica and Zhaozhi Qian and Ari Ercole and Mihaela van der Schaar},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=xnC8YwKUE3k}
}

To cite the Clairvoyance alpha blog post, please use:

van Der Schaar, M., Yoon, J., Qian, Z., Jarrett, D., & Bica, I. (2020). clairvoyance alpha: the first pipeline toolkit for medical time series. [Webpages]. https://doi.org/10.17863/CAM.70020

@misc{https://doi.org/10.17863/cam.70020,
  doi = {10.17863/CAM.70020},
  url = {https://www.repository.cam.ac.uk/handle/1810/322563},
  author = {Van Der Schaar,  Mihaela and Yoon,  Jinsung and Qian,  Zhaozhi and Jarrett,  Dan and Bica,  Ioana},
  title = {clairvoyance alpha: the first pipeline toolkit for medical time series},
  publisher = {Apollo - University of Cambridge Repository},
  year = {2020}
}
Owner
van_der_Schaar \LAB
We are creating cutting-edge machine learning methods and applying them to drive a revolution in healthcare.
van_der_Schaar \LAB
ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN

ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN CVPR 2020 (Oral); Pose and Appearance Attributes Transfer;

Men Yifang 400 Dec 29, 2022
A Distributional Approach To Controlled Text Generation

A Distributional Approach To Controlled Text Generation This is the repository code for the ICLR 2021 paper "A Distributional Approach to Controlled T

NAVER 102 Jan 07, 2023
In this tutorial, you will perform inference across 10 well-known pre-trained object detectors and fine-tune on a custom dataset. Design and train your own object detector.

Object Detection Object detection is a computer vision task for locating instances of predefined objects in images or videos. In this tutorial, you wi

Ibrahim Sobh 62 Dec 25, 2022
No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

This repository contains the implementation for the paper: No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consiste

Alireza Golestaneh 75 Dec 30, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

Lea Müller 68 Dec 06, 2022
A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics, sequence features, and user profiles.

CCasGNN A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics,

5 Apr 29, 2022
StorSeismic: An approach to pre-train a neural network to store seismic data features

StorSeismic: An approach to pre-train a neural network to store seismic data features This repository contains codes and resources to reproduce experi

Seismic Wave Analysis Group 11 Dec 05, 2022
Classic Papers for Beginners and Impact Scope for Authors.

There have been billions of academic papers around the world. However, maybe only 0.0...01% among them are valuable or are worth reading. Since our limited life has never been forever, TopPaper provi

Qiulin Zhang 228 Dec 18, 2022
Code to compute permutation and drop-column importances in Python scikit-learn models

Feature importances for scikit-learn machine learning models By Terence Parr and Kerem Turgutlu. See Explained.ai for more stuff. The scikit-learn Ran

Terence Parr 537 Dec 31, 2022
Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Video Autoencoder: self-supervised disentanglement of 3D structure and motion This repository contains the code (in PyTorch) for the model introduced

157 Dec 22, 2022
Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

RSPNet Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning" [Suppleme

35 Jun 24, 2022
This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Curious Representation Learning for Embodied Intelligence This is the pytorch code for the paper Curious Representation Learning for Embodied Intellig

19 Oct 19, 2022
Implementation for Homogeneous Unbalanced Regularized Optimal Transport

HUROT: An Homogeneous formulation of Unbalanced Regularized Optimal Transport. This repository provides code related to this preprint. This is an alph

Théo Lacombe 1 Feb 17, 2022
A simple algorithm for extracting tree height in sparse scene from point cloud data.

TREE HEIGHT EXTRACTION IN SPARSE SCENES BASED ON UAV REMOTE SENSING This is the offical python implementation of the paper "Tree Height Extraction in

6 Oct 28, 2022
CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning This repository contains the code and relevant instructions

XiaoMing 5 Aug 19, 2022
Beancount-mercury - Beancount importer for Mercury Startup Checking

beancount-mercury beancount-mercury provides an Importer for converting CSV expo

Michael Lynch 4 Oct 31, 2022
Affine / perspective transformation in Pose Estimation with Tensorflow 2

Pose Transformation Affine / Perspective transformation in Pose Estimation with Tensorflow 2 Introduction 이 repo는 pose estimation을 연구하고 개발하는 데 도움이 되기

Kim Junho 1 Dec 22, 2021
Vehicle speed detection with python

Vehicle-speed-detection In the project simulate the tracker.py first then simulate the SpeedDetector.py. Finally, a new window pops up and the output

3 Dec 15, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"

CSDNet-CSDGAN this is the code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement" Environment Preparing pyt

Jiaao Zhang 17 Nov 05, 2022