MIMIC Code Repository: Code shared by the research community for the MIMIC-III database


The MIMIC Code Repository is intended to be a central hub for sharing, refining, and reusing code used for analysis of the MIMIC critical care database. To find out more about MIMIC, please see: https://mimic.mit.edu. Source code for the website is in the mimic-website GitHub repository.

You can read more about the code repository in the following open access paper: The MIMIC Code Repository: enabling reproducibility in critical care research.

Cloud access to datasets

The various MIMIC databases are available on Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data on the cloud, simply add the relevant cloud identifier to your PhysioNet profile. Then request access to the dataset for the particular cloud platform via the PhysioNet project page. Further instructions are available on the MIMIC website.

Navigating this repository

This repository contains code for five databases on PhysioNet:

  • MIMIC-III - critical care data for patients admitted to ICUs at the BIDMC between 2001 - 2012
  • MIMIC-IV - hospital and critical care data for patients admitted to the ED or ICU between 2008 - 2019
  • MIMIC-IV-ED - emergency department data for individuals attending the ED between 2011 - 2019
  • MIMIC-IV Waveforms (TBD) - this dataset has yet to be published.
  • MIMIC-CXR - chest x-ray imaging and deidentified free-text radiology reports for patients admitted to the ED from 2012 - 2016

The repository contains one top-level folder containing community developed code for each datasets:

  • mimic-iii - build scripts for MIMIC-III, derived concepts which are available on the physionet-data.mimiciii_derived dataset on BigQuery, and tutorials.
  • mimic-iv - build scripts for MIMIC-IV, derived concepts which are available on the physionet-data.mimic_derived dataset on BigQuery, and tutorials.
  • mimic-iv-cxr - code for loading and analyzing both dicom (mimic-iv-cxr/dcm) and text (mimic-iv-cxr/txt) data. In order to clearly indicate that MIMIC-CXR can be linked with MIMIC-IV, we have named this folder mimic-iv-cxr, and any references to MIMIC-CXR / MIMIC-IV-CXR are interchangeable.
  • mimic-iv-ed - build scripts for MIMIC-IV-ED.
  • mimic-iv-waveforms - TBD

Each subfolder has a README with further detail regarding its content.


MIMIC-III is available on AWS (and MIMIC-IV will be available in the future). Use the below Launch Stack button to deploy access to the MIMIC-III dataset into your AWS account. This will give you real-time access to the MIMIC-III data in your AWS account without having to download a copy of the MIMIC-III dataset. It will also deploy a Jupyter Notebook with access to the content of this GitHub repository in your AWS account. Prior to launching this, please login to the MIMIC PhysioNet website, input your AWS account number, and request access to the MIMIC-III Clinical Database on AWS.

To start this deployment, click the Launch Stack button. On the first screen, the template link has already been specified, so just click next. On the second screen, provide a Stack name (letters and numbers) and click next, on the third screen, just click next. On the forth screen, at the bottom, there is a box that says I acknowledge that AWS CloudFormation might create IAM resources.. Check that box, and then click Create. Once the Stack has complete deploying, look at the Outputs tab of the AWS CloudFormation console for links to your Juypter Notebooks instance.


Other useful tools

  • Bloatectomy (paper) - A python based package for removing duplicate text in clinical notes
  • Medication categories - Python script for extracting medications from free-text notes
  • MIMIC Extract (paper) - A python based package for transforming MIMIC-III data into a machine learning friendly format
  • FIDDLE (paper) - A python based package for a FlexIble Data-Driven pipeLinE (FIDDLE), transforming structured EHR data into a machine learning friendly format


If you use code or concepts available in this repository, we would be grateful if you would:

  title={The MIMIC Code Repository: enabling reproducibility in critical care research},
  author={Johnson, Alistair E W and Stone, David J and Celi, Leo A and Pollard, Tom J},
  journal={Journal of the American Medical Informatics Association},
  publisher={Oxford University Press}


Our team has worked hard to create and share the MIMIC dataset. We encourage you to share the code that you use for data processing and analysis. Sharing code helps to make studies reproducible and promotes collaborative research. To contribute, please:

We encourage users to share concepts they have extracted by writing code which generates a materialized view. These materialized views can then be used by researchers around the world to speed up data extraction. For example, ventilation durations can be acquired by creating the ventdurations view in concepts/durations/ventilation_durations.sql.


By committing your code to the MIMIC Code Repository you agree to release the code under the MIT License attached to the repository.

Coding style

Please refer to the style guide for guidelines on formatting your code for the repository.

    • [x] Put an X between the brackets on this line if you have done all of the following:
      • Checked the online documentation: https://mimic.mit.edu/
      • Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=


    Hi! I am wondering if the SAPS-II scores for ICU visits are calculated in the MIMIC-IV dataset, and which module/table are they included in? If they are not currently included, will future versions of MIMIC-IV include mortality probability scores (e.g., SAPS, APACHE, SOFA) for ICU visits? Thank you so much for the help!

    opened by VoyagerWSH 4
  • v2.3.0(Dec 15, 2022)

    This release was built using MIMIC-IV v2.1. The release of this version will update the mimiciv_derived tables to use the latest version of MIMIC-IV on BigQuery, which is currently v2.1.

    Change log


    • Notebook with figures/tables for MIMIC-IV by @alistairewj in https://github.com/MIT-LCP/mimic-code/pull/1364
      • This is used to generate statistics for the paper describing MIMIC-IV (to be published shortly)
    • GitHub actions refactor by @alistairewj in https://github.com/MIT-LCP/mimic-code/pull/1400
      • Runs a GH action to test concept scripts on demo data in postgresql/mysql
    • Updated row validation counts for MIMIC-IV by @nragusa in https://github.com/MIT-LCP/mimic-code/pull/1425
    • Fix bug in calculation of first day GCS by @alistairewj in https://github.com/MIT-LCP/mimic-code/pull/1447

    Concept mapping

    • Add rxnorm concept mapping by @a-chahin in https://github.com/MIT-LCP/mimic-code/pull/1312
    • Add outputevents concept mapping by @a-chahin in https://github.com/MIT-LCP/mimic-code/pull/1309
    • Update loinc table by @a-chahin in https://github.com/MIT-LCP/mimic-code/pull/1310
    • Add procedures concept mapping by @a-chahin in https://github.com/MIT-LCP/mimic-code/pull/1308
    • Add chartevents concept mapping by @a-chahin in https://github.com/MIT-LCP/mimic-code/pull/1307

    PostgreSQL improvements

    • Updated MIMIC-IV-ED psql build scripts to v2.0 by @alistairewj in https://github.com/MIT-LCP/mimic-code/pull/1340
      • PostgreSQL build scripts now work with MIMIC-IV v2.0 and v2.1
    • mimic-iv/concepts: fix postgres-make-concepts and minor updates by @schu in https://github.com/MIT-LCP/mimic-code/pull/1363
    • Include postgres MIMIC-III concepts by @alistairewj in https://github.com/MIT-LCP/mimic-code/pull/1448
      • Now the scripts which generate MIMIC-III concepts in PostgreSQL are version controlled, and tested to work.

    MySQL improvements

    • MIMIC-IV MySQL build script update by @alistairewj in https://github.com/MIT-LCP/mimic-code/pull/1341
      • MySQL build scripts now work with MIMIC-IV v2.0 and v2.1.

    SQLite improvements

    • mimic-iv/buildmimic/sqlite/import.py: replace strip() by @schu in https://github.com/MIT-LCP/mimic-code/pull/1360
    • mimic-iv/buildmimic/sqlite/README: mention sqlalchemy requirement by @schu in https://github.com/MIT-LCP/mimic-code/pull/1361
    • mimic-iv/buildmimic/sqlite/README: remove "edit step" by @schu in https://github.com/MIT-LCP/mimic-code/pull/1362

    New Contributors

    • @schu made their first contribution in https://github.com/MIT-LCP/mimic-code/pull/1360
    • @nragusa made their first contribution in https://github.com/MIT-LCP/mimic-code/pull/1425

    Full Changelog


    Source code(tar.gz)
    Source code(zip)
  • v2.2.1(Jul 11, 2022)

    This release updates the MIMIC Code repository to align with MIMIC-IV v2.0. It also contains many bug fixes.

    Change log:

    • This version (v2.2.1) fixes a bug in the workflow generating tables on BigQuery occurring in v2.2.0. The rest of the changes below are in comparison to v2.1.1.
    • Build MIMIC scripts
      • Updated PostgreSQL build scripts for MIMIC-IV v2.0 (#1328, thanks @alexmbennett2)
      • Added SQLite build of MIMIC-IV (thanks @armando-fandango) and updated for MIMIC-IV v2.0
      • Fixed MySQL build code (thanks @mdsung) and updated for MIMIC-IV v2.0
      • Updated DuckDB code to work with MIMIC-IV v2.0
    • Concept improvements
      • The generation of BigQuery tables by the GitHub action no longer prints rows to the standard output
      • Fixed incompatibility of convert_bigquery_to_postgres.sh on Mac OS X. The script should run on both Mac OS X and Ubuntu now.
      • Fixed imputation of cell counts (#1208, thanks @duanxiangjie)
      • Added an initial concept mapping of labs to LOINC (thanks @a-chahin). This mapping will continue to be improved in this repository.
      • Fixed matching of GCS value with prior value in the last 6 hours (#1248, thanks @prockenschaub)
      • Added mapping tables for standard concepts for waveform data (#1321 and #1322, thanks @a-chahin)

    Full Changelog: https://github.com/MIT-LCP/mimic-code/compare/v2.1.1...v2.2.1

    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Jul 11, 2022)

    This release updates the MIMIC Code repository to align with MIMIC-IV v2.0. It also contains many bug fixes.

    Change log:

    • Build MIMIC scripts
      • Updated PostgreSQL build scripts for MIMIC-IV v2.0 (#1328, thanks @alexmbennett2)
      • Added SQLite build of MIMIC-IV (thanks @armando-fandango) and updated for MIMIC-IV v2.0
      • Fixed MySQL build code (thanks @mdsung) and updated for MIMIC-IV v2.0
      • Updated DuckDB code to work with MIMIC-IV v2.0
    • Concept improvements
      • The generation of BigQuery tables by the GitHub action no longer prints rows to the standard output
      • Fixed incompatibility of convert_bigquery_to_postgres.sh on Mac OS X. The script should run on both Mac OS X and Ubuntu now.
      • Fixed imputation of cell counts (#1208, thanks @duanxiangjie)
      • Added an initial concept mapping of labs to LOINC (thanks @a-chahin). This mapping will continue to be improved in this repository.
      • Fixed matching of GCS value with prior value in the last 6 hours (#1248, thanks @prockenschaub)
      • Added mapping tables for standard concepts for waveform data (#1321 and #1322, thanks @a-chahin)

    Full Changelog: https://github.com/MIT-LCP/mimic-code/compare/v2.1.1...v2.2.0

    Source code(tar.gz)
    Source code(zip)
  • v2.1.1(Dec 15, 2021)

    This is a bug fix release to ensure concepts are created correctly.

    Change log:

    • Rather than redirect the GitHub action output to /dev/null, the make concept query now uses bq query --quiet. This makes it easier to see where the script fails in the case of an error.
    • Fix syntax bugs in the norepinephrine / norepinephrine_equivalent_dose / ventilation queries
    • Various query changes are carried forward to postgresql scripts (vasoactive, ntprobnp, ventilation)
    • Use bg specimen in the severity score queries rather than specimen_pred

    Full Changelog: https://github.com/MIT-LCP/mimic-code/compare/v2.1.0...v2.1.1

    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Dec 15, 2021)

    This release fixes blood gas and (postgres) vent/oxygen delivery queries, adds the ntprobnp column to the cardiac_marker concept, and improves aux scripts for generating concepts in MIMIC-III.

    Change log:

    • Allow extra options to be passed to psql calls with MIMIC-III by @juliangilbey in https://github.com/MIT-LCP/mimic-code/pull/1195
    • A single table aggregating vasoactive agents is now available as vasoactive_agent, see https://github.com/MIT-LCP/mimic-code/pull/1203
    • Include BNP in cardiac markers concept by @pedrogemal in https://github.com/MIT-LCP/mimic-code/pull/1204
    • Fixed first day blood gas queries to use the specimen data present in labevents rather than a no-longer existing probabilistic prediction of speciment, https://github.com/MIT-LCP/mimic-code/pull/1209
    • Same PR as above, propagated previous vent/oxygen delivery changes to the postgres scripts and improved tests https://github.com/MIT-LCP/mimic-code/pull/1209

    Full Changelog: https://github.com/MIT-LCP/mimic-code/compare/v2.0.0...v2.1.0

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 7, 2021)

    This is the first release with the new repository organization where all MIMIC related code is located here, including MIMIC-III, MIMIC-IV, MIMIC-IV-ED, and MIMIC-CXR. Many thanks to @briangow for so much effort in doing this reorganization!

    Change log:

    • A GitHub action workflow now regenerates BigQuery tables for MIMIC-IV upon publish of a release, ensuring BigQuery is synchronized with the latest release of the code.
    • Added MIMIC-IV and MIMIC-IV-ED build scripts.
    • Added MIMIC-IV and MIMIC-IV-ED concepts.
    • Added code for parsing MIMIC-CXR DICOMs (dcm) and deidentified free-text reports (txt) - this is the mimic-iv-cxr subfolder here (the mimic-iv prefix helps clarify this data can be used with MIMIC-IV - i.e. mimic-iv-cxr is synonymous with MIMIC-CXR).
    • Added version of MIMIC-IV concepts in the PostgreSQL dialect. These concepts are (mostly) automatically generated using a shell script from the BigQuery syntax.
    • Various bug fixes for MIMIC concepts.
    Source code(tar.gz)
    Source code(zip)
  • v1.4.2(May 16, 2019)


    • Added an example R markdown notebook which uses BigQuery to connect to MIMIC
    • Filtered non-IV vancomycin administrations from the vancomycin dosing notebook
    • Documentation on a common failure case when building MIMIC
    • Added a contributed dplyr tutorial
    • Fixed logic in identifying central/arterial lines in metavision
    • Adjusted the calculation of UO in KDIGO to look backward; this will result in overestimation of UO and thus fewer AKI cases (before, the estimate was too low and AKI cases were potentially inflated)
    • Improve comments in various scripts
    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Sep 7, 2018)

    This is the latest release of the code repository. It contains a number of improvements in the build scripts and many more concepts. This build is for use with MIMIC-III v1.4.

    Source code(tar.gz)
    Source code(zip)
  • v1.4(Jul 2, 2017)

  • v1.3(Sep 6, 2016)

  • v1.2(Dec 4, 2015)

