Database Open Access
MIMIC-IV Clinical Database Demo
Alistair Johnson , Lucas Bulgarelli , Tom Pollard , Steven Horng , Leo Anthony Celi , Roger Mark
Published: Jan. 31, 2023. Version: 2.2
When using this resource, please cite:
(show more options)
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV Clinical Database Demo (version 2.2). PhysioNet. https://doi.org/10.13026/dp1f-ex47.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.
Background
The increasing adoption of digital electronic health records has led to the existence of large datasets that could be used to carry out important research across many areas of medicine. Research progress has been limited, however, due to limitations in the way that the datasets are curated and made available for research. The MIMIC datasets allow credentialed researchers around the world unprecedented access to real world clinical data, helping to reduce the barriers to conducting important medical research. The public availability of the data allows studies to be reproduced and collaboratively improved in ways that would not otherwise be possible.
Methods
First, the set of individuals to include in the demo was chosen. Each person in MIMIC-IV is assigned a unique subject_id
. As the subject_id
is randomly generated, ordering by subject_id
results in a random subset of individuals. We only considered individuals with an anchor_year_group
value of 2011 - 2013 or 2014 - 2016 to ensure overlap with MIMIC-CXR v2.0.0. The first 100 subject_id
who satisfied the anchor_year_group
criteria were selected for the demo dataset.
All tables from MIMIC-IV were included in the demo dataset. Tables containing patient information, such as emar or labevents, were filtered using the list of selected subject_id
. Tables which do not contain patient level information were included in their entirety (e.g. d_items or d_labitems). Note that all tables which do not contain patient level information are prefixed with the characters 'd_'.
Deidentification was performed following the same approach as the MIMIC-IV database. Protected health information (PHI) as listed in the HIPAA Safe Harbor provision was removed. Patient identifiers were replaced using a random cipher, resulting in deidentified integer identifiers for patients, hospitalizations, and ICU stays. Stringent rules were applied to structured columns based on the data type. Dates were shifted consistently using a random integer removing seasonality, day of the week, and year information. Text fields were filtered by manually curated allow and block lists, as well as context-specific regular expressions. For example, columns containing dose values were filtered to only contain numeric values. If necessary, a free-text deidentification algorithm was applied to remove PHI from free-text. Results of this algorithm were manually reviewed and verified to remove identified PHI.
Data Description
MIMIC-IV is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-IV Clinical Database page [1] or the MIMIC-IV online documentation [2]. The demo shares an identical schema and structure to the equivalent version of MIMIC-IV.
Data files are distributed in comma separated value (CSV) format following the RFC 4180 standard [3]. The dataset is also made available on Google BigQuery. Instructions to accessing the dataset on BigQuery are provided on the online MIMIC-IV documentation, under the cloud page [2].
An additional file is included: demo_subject_id.csv. This is a list of the subject_id
used to filter MIMIC-IV to the demo subset.
Usage Notes
The MIMIC-IV demo provides researchers with the opportunity to better understand MIMIC-IV data.
CSV files can be opened natively using any text editor or spreadsheet program. However, as some tables are large it may be preferable to navigate the data via a relational database. We suggest either working with the data in Google BigQuery (see the "Files" section for access details) or creating an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.
Code is made available for use with MIMIC-IV on the MIMIC-IV code repository [4]. Code provided includes derivation of clinical concepts, tutorials, and reproducible analyses.
Release Notes
Release notes for the demo follow the release notes for the MIMIC-IV database.
Ethics
This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
Acknowledgements
This research and development was supported by grants NIH-R01-EB017205, NIH-R01-EB001659, and NIH-R01-GM104987 from the National Institutes of Health. The authors would also like to thank Philips Healthcare and staff at the Beth Israel Deaconess Medical Center, Boston, for supporting database development, and Ken Pierce for providing ongoing support for the MIMIC research community.
Conflicts of Interest
The authors declare no competing financial interests.
References
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98.
- MIMIC Online Documentation. Accessed June 6th 2022. https://mimic.mit.edu/
- Shafranovich Y. Common format and MIME type for comma-separated values (CSV) files. https://www.hjp.at/doc/rfc/rfc4180.html
- Johnson AE, Stone DJ, Celi LA, Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research. Journal of the American Medical Informatics Association. 2018 Jan;25(1):32-9. https://github.com/MIT-LCP/mimic-code
Parent Projects
Access
Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Open Database License v1.0
Discovery
DOI (version 2.2):
https://doi.org/10.13026/dp1f-ex47
DOI (latest version):
https://doi.org/10.13026/ng9m-3n32
Topics:
critical care
electronic health record
mimic
Project Website:
https://mimic.mit.edu
Corresponding Author
Files
Total uncompressed size: 15.5 MB.
Access the files
- Download the ZIP file (15.4 MB)
- Access the files using the Google Cloud Storage Browser here. Login with a Google account is required.
-
Access the data using the Google Cloud command line tools (please refer to the gsutil
documentation for guidance):
gsutil -m -u YOUR_PROJECT_ID cp -r gs://mimic-iv-demo-2.2.physionet.org DESTINATION
-
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/mimic-iv-demo/2.2/
-
Download the files using AWS command line tools:
aws s3 sync --no-sign-request s3://physionet-open/mimic-iv-demo/2.2/ DESTINATION