Deep Survival Machines - Fully Parametric Survival Regression

Overview

Build Status     codecov     License: MIT     GitHub Repo stars

Package: dsm

Python package dsm provides an API to train the Deep Survival Machines and associated models for problems in survival analysis. The underlying model is implemented in pytorch.

For full documentation of the module, please see https://autonlab.github.io/DeepSurvivalMachines/

What is Survival Analysis?

Survival Analysis involves estimating when an event of interest, T would take place given some features or covariates X. In statistics and ML, these scenarios are modelled as regression to estimate the conditional survival distribution, P(T>t|X).
As compared to typical regression problems, Survival Analysis differs in two major ways:

  • The Event distribution, T has positive support i.e. T ∈ [0, ∞).
  • There is presence of censoring i.e. a large number of instances of data are lost to follow up.

Deep Survival Machines

Deep Survival Machines (DSM) is a fully parametric approach to model Time-to-Event outcomes in the presence of Censoring, first introduced in [1]. In the context of Healthcare ML and Biostatistics, this is known as 'Survival Analysis'. The key idea behind Deep Survival Machines is to model the underlying event outcome distribution as a mixure of some fixed ( K ) parametric distributions. The parameters of these mixture distributions as well as the mixing weights are modelled using Neural Networks.

Usage Example

from dsm import DeepSurvivalMachines
model = DeepSurvivalMachines()
model.fit()
model.predict_risk()

Recurrent Deep Survival Machines

Recurrent Deep Survival Machines (RDSM) builds on the original DSM model and allows for learning of representations of the input covariates using Recurrent Neural Networks like LSTMs, GRUs. Deep Recurrent Survival Machines is a natural fit to model problems where there are time dependendent covariates.

Deep Convolutional Survival Machines

Predictive maintenance and medical imaging sometimes requires to work with image streams. Deep Convolutional Survival Machines extends DSM and DRSM to learn representations of the input image data using convolutional layers. If working with streaming data, the learnt representations are then passed through an LSTM to model temporal dependencies before determining the underlying survival distributions.

⚠️ Not Implemented Yet!

Deep Cox Mixtures

The Cox Mixture involves the assumption that the survival function of the individual to be a mixture of K Cox Models. Conditioned on each subgroup Z=k; the PH assumptions are assumed to hold and the baseline hazard rates is determined non-parametrically using an spline-interpolated Breslow's estimator. For full details on Deep Cox Mixture, refer to the paper:

Deep Cox Mixtures for Survival Regression. Machine Learning in Health Conference (2021)

Installation

[email protected]:~$ git clone https://github.com/autonlab/DeepSurvivalMachines.git
[email protected]:~$ cd DeepSurvivalMachines
[email protected]:~$ pip install -r requirements.txt

Examples

  1. Deep Survival Machines on the SUPPORT Dataset
  2. Recurrent Deep Survival Machines on the PBC Dataset

References

Please cite the following papers if you are using the dsm package.

[1] Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks. IEEE Journal of Biomedical & Health Informatics (2021)

  @article{nagpal2021deep,
  title={Deep Survival Machines: Fully Parametric Survival Regression and\
  Representation Learning for Censored Data with Competing Risks},
  author={Nagpal, Chirag and Li, Xinyu and Dubrawski, Artur},
  journal={IEEE Journal of Biomedical and Health Informatics},
  year={2021}
  }

[2] Deep Parametric Time-to-Event Regression with Time-Varying Covariates. AAAI Spring Symposium (2021)

@InProceedings{pmlr-v146-nagpal21a,
  title = 	 {Deep Parametric Time-to-Event Regression with Time-Varying Covariates},
  author =       {Nagpal, Chirag and Jeanselme, Vincent and Dubrawski, Artur},
  booktitle = 	 {Proceedings of AAAI Spring Symposium on Survival Prediction - Algorithms, Challenges, and Applications 2021},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR},
  }

[3] Deep Cox Mixtures for Survival Regression. Machine Learning for Healthcare (2021)

@InProceedings{nagpal2021dcm,
  title={Deep Cox Mixtures for Survival Regression},
  author={Nagpal, Chirag and Yadlowsky, Steve and Rostamzadeh, Negar and Heller, Katherine},
  booktitle={Proceedings of the 6th Machine Learning for Healthcare Conference},
  pages={674--708},
  year={2021},
  volume={149},
  series={Proceedings of Machine Learning Research},
  publisher={PMLR},
}

Compatibility

dsm requires python 3.5+ and pytorch 1.1+.

To evaluate performance using standard metrics dsm requires scikit-survival.

Contributing

dsm is on GitHub. Bug reports and pull requests are welcome.

License

MIT License

Copyright (c) 2020 Carnegie Mellon University, Auton Lab

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Owner
Carnegie Mellon University Auton Lab
Carnegie Mellon University Auton Lab
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
Book Recommender System Using Sci-kit learn N-neighbours

Model-Based-Recommender-Engine I created a book Recommender System using Sci-kit learn's N-neighbours algorithm for my model and the streamlit library

1 Jan 13, 2022
List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

Favio André Vázquez 11.7k Dec 30, 2022
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022
🤖 ⚡ scikit-learn tips

🤖 ⚡ scikit-learn tips New tips are posted on LinkedIn, Twitter, and Facebook. 👉 Sign up to receive 2 video tips by email every week! 👈 List of all

Kevin Markham 1.6k Jan 03, 2023
MLflow App Using React, Hooks, RabbitMQ, FastAPI Server, Celery, Microservices

Katana ML Skipper This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable

Tom Xu 8 Nov 17, 2022
Predicting diabetes over a five year period using logistic regression and the Pima First-Nation dataset

Diabetes This script uses the Pima First Nations dataset to create a model to predict whether or not an individual will develop Diabetes Mellitus Type

1 Mar 28, 2022
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

Amazon Web Services 1.8k Jan 01, 2023
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 05, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.2k Jan 02, 2023
#30DaysOfStreamlit is a 30-day social challenge for you to build and deploy Streamlit apps.

30 Days Of Streamlit 🎈 This is the official repo of #30DaysOfStreamlit — a 30-day social challenge for you to learn, build and deploy Streamlit apps.

Streamlit 53 Jan 02, 2023
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
A Collection of Conference & School Notes in Machine Learning 🦄📝🎉

Machine Learning Conference & Summer School Notes. 🦄📝🎉

558 Dec 28, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

旷视天元 MegEngine 371 Dec 21, 2022
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022