Database Open Access
MIMIC-IV demo data in the OMOP Common Data Model
Michael Kallfelz , Anna Tsvetkova , Tom Pollard , Manlik Kwong , Gigi Lipori , Vojtech Huser , Jeffrey Osborn , Sicheng Hao , Andrew Williams
Published: June 21, 2021. Version: 0.9
MIMIC-IV demo available in the OMOP Common Data Model (June 28, 2021, 12:36 p.m.)
We are pleased to announce that a 100-patient demo of MIMIC-IV has been made available in the OMOP Common Data Model. The dataset is currently undergoing user testing and has known limitations (for example, the inputevents and outputevents tables are not yet incorporated). For more detail, please visit the project page on PhysioNet and the associated GitHub repository.
This work builds on previous efforts by Nicolas Paris, Adrien Parrot and colleagues on MIMIC-III. The project was in part supported by grants from Bill and Melinda Gates foundation and National Library of Medicine (NLM), National Institutes of Health.
When using this resource, please cite:
(show more options)
Kallfelz, M., Tsvetkova, A., Pollard, T., Kwong, M., Lipori, G., Huser, V., Osborn, J., Hao, S., & Williams, A. (2021). MIMIC-IV demo data in the OMOP Common Data Model (version 0.9). PhysioNet. https://doi.org/10.13026/p1f5-7x35.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
In this project, the MIMIC-IV demo database was used to create an Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) instance. The transformation was built making use of template scripts and concept mappings from a previous project for converting MIMIC-III data to the OMOP CDM, while adjusting the logic and extending the mappings to be compliant with MIMIC-IV. The desired outcome was to establish and improve ways of transforming observational data collected in an intensive care environment to the OMOP CDM.
The Observational Medical Outcomes Partnership (OMOP) was a public-private partnership established in the US to inform the appropriate use of observational healthcare databases for studying the effects of medical products. A core output of this project was the OMOP Common Data Model (CDM) which represents healthcare data from diverse sources in a consistent and standardized way. Such models allow portability of analysis and development of tools that facilitate research [1-3].
MIMIC is an electronic health record database that is widely used around the world in research and education [4,5]. We carried out work to convert the latest version of MIMIC - MIMIC-IV - into the Observational Medical Outcomes Partnership (OMOP) model .
This project outlines development of a demo version of the dataset, based on a 100 patient subset of the full MIMIC-IV dataset. The following three modules of the MIMIC-IV dataset were included:
- core: patient stay information (i.e. admissions and transfers)
- hosp: hospital level data for patients: labs, micro, and electronic medication administration
- icu: Event tables from the ICU, similar to those in MIMIC-III [6,7]
As a proof of concept, waveform files from the MIMIC waveform repository were processed to extract features for inclusion in the OMOP transform . Measurements were linked to the respective origin waveform file.
Motivations for this project were to:
- Facilitate greater use of MIMIC-IV data by making it accessible in the widely-used OMOP CDM format;
- Enable use of OHDSI tools on the MIMIC-IV data;
- Provide the OHDSI, PhysioNet, and other OMOP-using research communities with a dataset that can be used for demonstrations and education; and
- Investigate the feasibility of converting ICU waveform and “numerics” data to standardized concepts in the OMOP CDM, so that methods can be developed to conduct research that requires both waveform features and related clinical data.
The project used the Achilles  data characterization tool to provide plausibility and comprehensiveness reports on the transform, as well as insights into adherence of the data to the OMOP model conventions.
We used SQL (wrapped in Python) to convert data in MIMIC format into the OMOP CDM. Documented ETL code is available at the project repository on GitHub . The OMOP model philosophy is to use standard concepts (a set of accepted target terminologies) to represent the data. For example, drug exposure data are captured using RxNorm concepts. We used OMOP vocabulary tables to map concepts. To capture source concepts, the ETL conversion uses local concepts (assigning a concept_id in the 2 billion number range; an OMOP model convention) and then creates a mapping to a respective standard target concept. Mapping tables used as part of the ETL are available in the project repository (see "custom_mapping_csv").
We used prior mappings published at the repository  of a prior mapping project  as a foundation and added many more mappings to achieve a higher coverage. For many source codes, existing OMOP concepts from standardized vocabularies such as LOINC or ICD10 were used.
Mapping for missing relationships was done in several stages. First, suggested mappings were generated computationally ("suggestions"). For concepts with no suggestions, a manual mapping was created. Second, an expert with a medical background manually reviewed all mapping for accuracy. Third, mapping was reviewed again by a larger team after execution of data quality checks.
We used several approaches to assess the quality of the conversion. We applied OHDSI data quality tools (Data Quality Dashboard, DQD)  and data characterization tools (Achilles) , both built in R. The project repository contains the JSON from the most recent Achilles and DQD run and documentation on which data quality errors remained. The majority of data quality issues identified by these tools relate to the MIMIC de-identification timeshift.
The project was carried out in fall 2020 and spring 2021 with updates to ETL made by OHDSI community researchers building on prior work on the ETL for MIMIC-III to OMOP. Meeting minutes of OHDSI interest group that worked on the project in 2020 are available at the Wiki page of the repository . SQL ETL code is optimized to be executed in Google BigQuery.
Ongoing maintenance of the mapping and ETL is expected to be carried out by OHDSI researchers making pull requests to the GitHub repository. The code is made public primarily to enable such community collaboration. Updates may consist of improvements of the ETL, improvement to the concept mapping, responses to updates and changes in MIMIC data releases or changes to the OMOP Common Data Model (CDM). A readme file in the repository contains a point of contact who will review ETL additions submitted by the community via pull requests.
Data is organized into folders. The dataset is limited only to patients included in the demo subset (100 patients).
1_omop_data_csv: Contains a set of CSV files, one for each OMOP table. First row of each file contains column headers. Columns follow the OMOP model specification. See github repository for scripts that generated files in the omop_data_csv. A detailed description of the OMOP model is available in textbook format at  (chapter 4). For OMOP vocabulary tables, only local concepts (>2B range) are provided. For the remaining vocabulary content, users should follow standard mechanisms for obtaining OMOP vocabulary tables. See , chapter 5, section 5.1.2: ‘Access to the Standardized Vocabularies’). If a file has no rows (only column headers), no MIMIC data was transformed into that specific OMOP table.
2_achilles_json_data: JSON files generated by the Achilles data characterization tool . The JSON files can be used in conjunction with the Achilles Web application to browse aggregate data characterizations.
3_data_quality_dashboard_files: JSON files that are generated by the Data Quality Dashboard .
CSV files can be downloaded and used directly to learn explore the OMOP CDM. The size of the demo dataset is small and data can be kept in memory. However, most OHDSI analytical packages require the data to be in a database. For the latest documentation and issues (for the demo or full dataset), please refer to the project repository on GitHub . The ETL code is designed for both the demo and full datasets. The mappings in the ETL are not limited to concepts present in the demo data but cover the complete spectrum of the full (non-demo) data.
For help with implementing the MIMIC-IV OMOP CDM or to discuss use cases with the community, users can create a forum posting at  . If ETL logic errors or incorrect custom mappings are found, issues can be raised through the project’s GitHub repository . Users are encouraged to contribute code improvements and concept mappings via pull requests. To request elevated privileges on the project repository, please raise an issue.
Tools and resources
One of the benefits of transforming MIMIC to the OMOP CDM is that the data becomes accessible to the OMOP tools and resources. OMOP’s main data quality tool is the Data Quality Dashboard (see textbook , chapter 15 ‘Data Quality’). Numerous tools for data analysis (for data characterization, population level estimation or patient level prediction) exist (see textbook , chapter 8 ‘OHDSI Analytics Tools’).
Currently, free-text notes are not populated in the OMOP CDM and several MIMIC concepts are not adequately mapped to standard concepts. The
emar_detail tables from the “hosp” module have not been used for additional detail extraction. Drug exposure entries were populated from the
pharmacy tables. The
outputevents have not yet been incorporated in the conversion.
Version 0.9: This release represents an initial version of the conversion. We hope to address limitations identified by the community in future versions. Users should watch the project repository  for updates. The project describes work on the MIMIC-IV demo (a 100 patient subset of MIMIC-IV).
We would like to thank all contributors to the ETL code. Specific thanks go to: Anna Tsvetkova, Tatyana Mironova, Dmytry Dymshyts, Gigi Lipori, Jeff Osborn
We would like to thank all authors of the MIMIC-III ETL  that was used as the starting point for the development of MIMIC-IV mapping (Nicolas Paris, Adrien Parrot). We would also like to thank the PhysioNet team for their support, especially Tom Pollard and Alistair Johnson.
This work was, in part, supported by a grant from Bill and Melinda Gates foundation. VH contribution to this work was carried out with support from National Library of Medicine (NLM), National Institutes of Health.
Conflicts of Interest
The authors have no conflicts of interest to declare.
- Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform. 2015;216:574-8. PMC4815923. https://pubmed.ncbi.nlm.nih.gov/26262116/
- Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012 Jan-Feb;19(1):54–60. Epub 2011 Oct 28
- Observational Medical Outcomes Partnership website: https://www.ohdsi.org [Accessed: 21 June 2021]
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98.
- Johnson, Alistair EW, David J. Stone, Leo A. Celi, and Tom J. Pollard. “The MIMIC Code Repository: enabling reproducibility in critical care research.” Journal of the American Medical Informatics Association (2017): ocx084.
- Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
- Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
- Moody, B., Moody, G., Villarroel, M., Clifford, G., & Silva, I. (2020). MIMIC-III Waveform Database (version 1.0). PhysioNet. https://doi.org/10.13026/c2607m.
- OHDSI Achilles Data Characterization Tool: http://ohdsi.github.io/Achilles/ [Accessed: 21 June 2021]
- MIMIC to OMOP ETL on GitHub: https://github.com/OHDSI/MIMIC [Accessed: 21 June 2021]
- MIMIC-III OMOP Concept Mapping: https://github.com/MIT-LCP/mimic-omop/tree/master/extras/concept [Accessed: 21 June 2021]
- Nicolas Paris, Adrien Parrot. MIMIC in the OMOP Common Data Model. medRxiv 2020.08.14.20175141; doi: https://doi.org/10.1101/2020.08.14.20175141
- OHDSI Data Quality Dashboard: https://ohdsi.github.io/DataQualityDashboard/ [Accessed: 21 June 2021]
- OHDSI Documentation: https://ohdsi.github.io/TheBookOfOhdsi [Accessed: 21 June 2021]
- OHDSI Community Forum: https://forums.ohdsi.org [Accessed: 21 June 2021]
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Open Database License v1.0
omop common data model
Total uncompressed size: 73.0 MB.
Access the files
- Download the ZIP file (10.3 MB)
- Access the files using the Google Cloud Storage Browser here. Login with a Google account is required.
Access the data using the Google Cloud command line tools (please refer to the gsutil
documentation for guidance):
gsutil -m -u YOUR_PROJECT_ID cp -r gs://mimic-iv-demo-omop-0.9.physionet.org DESTINATION
- Request access using Google BigQuery.
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/mimic-iv-demo-omop/0.9/
|LICENSE.txt (download)||25.2 KB||2021-06-21|
|SHA256SUMS.txt (download)||101.7 KB||2021-06-21|