Database Open Access
MIMIC-IV Clinical Database Demo on FHIR
Alex Bennett , Joshua Wiedekopf , Hannes Ulrich , Alistair Johnson
Published: June 7, 2022. Version: 2.0
When using this resource, please cite:
(show more options)
Bennett, A., Wiedekopf, J., Ulrich, H., & Johnson, A. (2022). MIMIC-IV Clinical Database Demo on FHIR (version 2.0). PhysioNet. https://doi.org/10.13026/2f5g-dh02.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
Interoperability of healthcare data has become increasingly important given the increase in deployment of data driven algorithms in clinical settings. The Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a promising mechanism to share healthcare data across vendors in real-time and batch settings. Real-world datasets available in FHIR would accelerate research and development of data-driven algorithms. Existing datasets in FHIR are primarily synthetic, and cover a limited number of resources. To address this gap, we have reformatted the Medical Information Mart for Intensive Care (MIMIC)-IV Clinical Database Demo into FHIR. The MIMIC clinical databases have received wide adoption and the constituent data are understood by the community. As much as possible, we adhered to the base resources with minimal extensions. Alongside the dataset, we publish openly available code allowing researchers to quickly build upon our work. Translating MIMIC-IV into FHIR provides a benchmark dataset for institutions to experiment with FHIR based tools, and we hope this resource supports adoption and use of FHIR.
Background
The FHIR standard provides a framework for structuring health data and supporting data exchange amongst disparate systems and vendors. At the G7 discussion on Open Standards and Interoperability, FHIR was noted to be gaining wide traction in five of the seven countries with the remaining two looking to adopt FHIR in the near future [1]. In these countries, translation of legacy systems into FHIR will lead to significant research and development in FHIR. Large scale development of FHIR is often accelerated through testing with real-world data. However, access to patient data is restricted for security and privacy concerns.
The use of synthetic data is an intriguing option to bypass the security and privacy concerns of patient data. Synthetic data generation is an active research field, with one prominent FHIR based example being Synthea [2]. Synthea is a tool to create synthetic electronic health records representing patients with the most common demographic, disease distribution, and disease progressions found in the US. The datasets generated by Synthea cover the primary use cases in testing environments, but were not designed to include artefactual data or information derived from atypical workflows. Real-world data includes these necessary components to support development of robust applications that can handle inconsistent data and edge cases.
MIMIC-IV is a relational database corresponding to over 60,000 patients admitted to the Beth Israel Deaconess Medical Center in Boston, MA [3]. MIMIC-IV has gained traction in the community due to its transparent mechanism of data access, reasonably large sample size, and authentic capture of a real-world electronic health record database. MIMIC-IV has been utilized in over 3000 publications, exploring retrospective analyses and research application development.
Recent work has focused on translation of MIMIC-IV into standard data models. A preliminary version of MIMIC-IV has been created in the OMOP data model on PhysioNet [4]. Another group based in Germany transformed a substantial portion of MIMIC-III and MIMIC-IV into FHIR to assist in research and development in the German FHIR context [5-6]. MIMIC-IV-on-FHIR Demo, this PhysioNet project, aims to translate MIMIC-IV into FHIR, preserve the MIMIC-IV structure and information, and provide an easily accessible FHIR dataset for use in research and development.
Methods
MIMIC-IV-on-FHIR aims to capture MIMIC-IV as is in the FHIR format. FHIR stores healthcare information in resources. Thus, the MIMIC-IV data tables were mapped to equivalent FHIR resources. The mapping process involved five steps:
-
Terminology Generation. Capturing the MIMIC-IV terminology in FHIR was needed to retain the rich context MIMIC provides. FHIR stores terminology in two resources: CodeSystems and ValueSets. CodeSystems are the source for codes and ValueSets are use-case specific and combinations of CodeSystems. Codes were pulled from the MIMIC-IV tables and then converted to CodeSystems and ValueSets. Once created, the Valuesets are bound to data elements in resources. These bindings are utilized by the FHIR server to validate proper codes are assigned to resource data elements.
-
Implementation Guide Creation. To provide a reproducible FHIR format for MIMIC, an implementation guide was made. A FHIR implementation guide is a collection of FHIR profiles and terminology aiming to achieve a task within a specific domain. FHIR profiles are modifications of the base FHIR resources. The terminology generated in the first step were used to bind elements in the profiles. Effectively this adds a layer of validation that only the codes in the terminology system can be assigned to an element. The MIMIC implementation guide includes 22 profiles, 64 terminology resources, and 2 extensions. Where possible the US Core R4 profiles were used as the basis for the MIMIC profiles [7]. The MIMIC implementation guide lays the framework for the future steps of mapping and validation.
-
Mapping. The goal for mapping was to have as complete a picture of MIMIC-IV in FHIR. Each column in MIMIC-IV was investigated to identify potential mappings from MIMIC-IV columns to FHIR resource elements. The final MIMIC-IV column to FHIR element mappings can be found in the MIMIC implementation guide. For MIMIC-IV columns without direct mappings into FHIR, extensions were made to house the information. Custom SQL scripts were used to facilitate the conversion from MIMIC-IV tables to a new schema mimic_fhir. After mapping into FHIR, the resources are raw and must go through FHIR validation before they are ready for use and distribution.
-
Validation. To ensure the mappings were consistent with the MIMIC implementation guide, FHIR validation was required. A FHIR server is both a store for FHIR resources and a service to validate, search, and export resources. A FHIR server was created with the mimic implementation guide applied to it, meaning any resources sent to the FHIR server would be validated against the guide. Validation can turn up issues in terminology binding, inter-resource referencing or improper element mappings. The FHIR validation effectively provides unit testing for the correctness of the MIMIC to FHIR mappings completed.
-
Export. To make mimic-fhir accessible, the resources needed to be exported to a common format. The ndjson format is an ideal choice, as it is meant for delivering large structured data. Exporting the validated mimic-fhir resources to ndjson required two main steps. First, the bulk export functionality of FHIR servers was leveraged to export all the resources. Second, the exported resources were then written to the ndjson format. At this stage the exported njdsons represent the full picture of MIMIC-IV-on-FHIR.
Data Description
MIMIC-IV-on-FHIR is a collection of MIMIC-IV data tables mapped to FHIR R4 resources. FHIR resources generated include Condition, Encounter, Medication, MedicationAdministration, MedicationDispense, MedicationRequest, Observation, Organization, Patient, and Procedure.
Table 1. Approximate mapping of MIMIC-IV tables with FHIR resources.
MIMIC Schema |
MIMIC-IV Table |
MIMIC-FHIR Profile |
Notes |
---|---|---|---|
core |
patients |
MIMIC_Patient |
Patient birthdate was estimated by taking the anchor_age and subtracting the transfers.intime. |
admissions |
MIMIC_Encounter |
|
|
transfers |
MIMIC_Encounter MIMIC_Location |
Careunits are translated to Location resources. These are referenced in the main Encounter |
|
hosp |
diagnoses_icd, d_icd_diagnoses |
MIMIC_Condition |
|
procedures_icd, d_icd_procedures |
MIMIC_Procedure |
|
|
labevents, d_labitems |
MIMIC_Observation_Labevents |
LOINC was not used, kept the original itemid codes and will perform further mapping in a future step |
|
microbiologyevents |
MIMIC_Observation_MicroTest |
Microbiology data had to be divided into three separate resources to be represented in FHIR. There is a parent-child relationship going from Test->Org->Susc. Specimen is collected and stored in a custom profile |
|
prescriptions, poe |
MIMIC_Medication_Request |
Prescriptions was the primary source for medication requests, but poe was used if a request was made but no linking pharmacy_id was present |
|
pharmacy |
MIMIC_Medication_Dispense |
|
|
emar, emar_detail |
MIMIC_Medication_Administration |
The medication referenced are primarily drug names and formulary drug codes, does not link back to NDC/GSN |
|
icu |
icustays |
MIMIC_Encounter_ICU |
All icu profiles point to the this profile, which references back to the original hospital admission encounter |
procedureevents |
MIMIC_Procedure_ICU |
|
|
inputevents |
MIMIC_Medication_Administration_ICU |
References medication that was stored in icu.d_items |
|
chartevents |
MIMIC_Observation_Chartevents |
|
|
datetimeevents |
MIMIC_Observation_Datetimevents |
|
|
outputevents |
MIMIC_Observation_Outputevents |
|
Patient and Organization Resources
Organization
The Organization resource records any institutions or organization associated with healthcare services.
A single Organization resource was created for all of MIMIC-IV. The Beth Israel Deaconess Medical Center is the primary Organization that all patients reference back to.
Location
The Location resource records any physical location where services are delivered in relation to the hospital.
Each careunit found in transfers was translated into a Location resource. In total, there are 41 Location resources created from the careunits. These Location resources are then referenced in the main Encounter.location element.
Patient
The Patient resource records the demographic information for a patient associated with an organization.
The patients table joined with the admissions and transfers tables mapped to the MIMIC_Patient profile. Several assumptions were made to assist in the mapping to the Patient resource. First, the birthdate was calculated since only an anchor_age is given for patients. The transfers.intime column was used as a base for the birthdate calculation versus using the admissions.admittime since the admittime column is not always present. Second, the birthsex extension element pulls in gender since there is no distinction in MIMIC-IV. Finally, the Patient name element is derived from the subject_id column in the form of “Patient_<subject_id>”.
Encounter
The Encounter resource records the full span of a hospital stay, including admission, stay, and discharge. Inpatient and outpatient encounters can be stored in the resource, but for use with MIMIC-IV only inpatient encounters are stored. The three Encounter profiles are based on the US Core Encounter profile.
The admissions table is mapped to the MIMIC_Encounter profile. MIMIC_Encounter contains custom terminology bindings for admission class, admission type, admission source and discharge disposition. The primary information mapped from admissions was the admission start and stop time along with the context for the admission. Additional information is supplied for the encounter from transfers in the form of Location resources. The Location resources track the movement of the patient throughout their Encounter.
The icustays table is mapped to the profile MIMIC_Encounter_ICU. MIMIC_Encounter_ICU contains terminology bindings for only the admission type, as icustays does not contain the same detail as admissions. The primary information mapped from icustays was the timing of the stay and the type of encounter.
Measurement Resources
ObservationLabevents
A standard way to map lab Observations was created by US Core with their profile USCoreLaboratoryResultObservationProfile. The US core profile was used as the basis for the mimic profile MIMIC_Observation_Labevents. MIMIC_Observation_Labevents contains terminology bindings for code and interpretation plus an extension for labs priority. The labevents table as mapped to MIMIC_Observation_Labevents, primarily capturing the item code, resulting value, interpretation, and timing for labs.
ObservationMicro
The microbiology observations are too complex to map into a single resource, so the results must be broken into three: test, organism, and susceptibility result. Thus three Observation profiles were generated: MIMIC_Observation_Micro_Test, MIMIC_Observation_Micro_Org, and MIMIC_Observation_Micro_Susc. These profiles were based on the US Core Laboratory Result Observation Profile.
MIMIC_Observation_Micro_Test captures the test information from microbiologyevents. Test codes, timing and comments are mapped into the profile. Additionally, references to the associated MIMIC_Observation_Micro_Org resources are created. There are a significant number of tests with no reference to an organism, and these tests are stored with their result pulled from the comments.
MIMIC_Observation_Micro_Org captures the organism information from microbiologyevents. Organism codes and names are mapped into the profile. Additionally, references to the child MIMIC_Observation_Micro_Susc resources and parent MIMIC_Observation_Micro_Test resources are created.
MIMIC_Observation_Micro_Susc captures the susceptibility results from microbiologyevents. Antibiotic codes, timing, and interpretation are mapped into the profile. Additionally, references to parent MIMIC_Observation_Micro_Org resources are created. An extension was also added to MIMIC_Observation_Micro_Susc to house the dilution values for susceptibility testing.
Specimen
The Specimen resource records information about a material sample. The Specimen profiles are based on the base Specimen resource.
The MIMIC_Specimen_Lab profile houses the information about the laboratory specimen. A terminology binding was added to the profile for the type of fluid. The labevents table is used as the source for lab specimen. The primary information mapped to the profile is the specimen identifier and the fluid type.
The MIMIC_Specimen_Micro profile houses the information about the microbiology specimen. A terminology binding was added to the profile for the type of fluid. The microbiologyevents table is used as the source for microbiology specimen. The primary information mapped to the profile is the specimen identifier and the specimen type.
Medication Resources
Medication
The Medication resource records medications and drugs available in healthcare settings. The resource can store information on individual drugs or a combination of drugs from a prescription.
MIMIC does not have one data table for medications so distinct medications resources were generated from seven sources.
-
Formulary drug codes are pulled from prescriptions and emar_detail.product_code
-
National Drug Codes (NDC) are pulled in from prescriptions
-
Generic Sequence Numbers (GSN) are pulled in from prescriptions
-
Generic medication names are pulled from prescriptions.drug, pharmacy.medication, and emar.medication
-
IV specific events were pulled from poe.order_type when order_type is set to IV therapy or TPN.
-
ICU medication are pulled from d_items with linksto=’inputevents’
-
Medication mixes were pulled from prescriptions. Medication mixes are grouped medication with the same pharmacy_id. The medication mix will reference each medication as ingredients. Medication mixes are included since only one Medication reference is allowed for MedicationRequest even though multiple medication can be requested together.
The medication sources are mapped to MIMIC_Medication. MIMIC_Medication contains terminology bindings for the medication code. The medication code binds to a ValueSet that is a combination of the seven medication CodeSystems.
MedicationAdministration
The MedicationAdminstration resource records any administration of medication in a healthcare setting. The two main MIMIC documented sources for administration come from emar and inputevents. Each of these tables were mapped to a custom profile derived from the base R4 fhir profiles.
Emar and emar_detail are mapped to MIMIC_Medication_Administration profile. MIMIC_MedicationAdministration contains terminology bindings for medication site, medication method, and medication route. The primary information is pulled from each row in emar_detail, with emar supplying the medication reference only when it is not available in emar_detail.product_code.
The inputevents table was mapped to MIMIC_Medication_Administration_ICU. MIMIC_Medication_Administration_ICU contains terminology bindings for category ICU and medication method ICU. The ICU medication administration resources will be smaller than emar medication administrations since less detailed information is available in MIMIC.
MedicationDispense
The MedicationDispense resource records the supply of medication to a patient. The main system that dispenses medication is the pharmacy.
The pharmacy table was mapped to MIMIC_Medication_Dispense. MIMIC_Medication_Dispense contains terminology bindings for medication route and medication frequency. The MedicationDispense is linked back to MedicationRequest through the element authorizingPrescription.
MedicationRequest
The MedicationRequest resource records orders for medication and the administration instructions for the medication. The resource only accepts a single Medication resource, which means the majority of referenced medication will be medication mixes coming from prescriptions. There were two sources for MedicationRequest: prescriptions and poe.
The prescriptions table was mapped to MIMIC_Medication_Request. MIMIC_Medication_Request contains terminology bindings for medication route and medication frequency. The prescriptions table is supplemented by pharmacy to fill dosage information.
The poe table maps orders to MIMIC_MedicationRequest when emar events do not have a related pharmacy_id. This scenario is typically for IV or TPN events occurring in the hospital when a direct prescription may not have been written but the medication was ordered.
Charted Data Resources
Observations in the ICU
The Observation resource records any measurements made about a patient. The chartevents, datetimeevents, and outputevents tables are all measurement tables for ICU events. Each table maps to a mimic Observation profile, based on the base Observation resource. The mimic profiles are of a similar format but with different terminology bindings based on the source table.
The chartevents table is mapped to the MIMIC_Observation_Chartevents profile. MIMIC_Observation_Chartevents enforces terminology bindings for the observation code and category. Chartevents captures the majority of documented information from the ICU, thus the primary information mapped was the timing, item code and the result of the item. Additional columns were mapped to capture reference ranges and observation category.
The datetimeevents table is mapped to the MIMIC_Observation_Datetimeevents profile. MIMIC_Observation_Datetimeevents enforces terminology bindings for observation code and category. Datetimeevents documents any information that is a datetime format, thus the primary information mapped were the datetime value and the item code.
The outputevents table is mapped to the MIMIC_Observation_Outputevents profile. MIMIC_Observation_Outputevents enforces terminology bindings for observation code and category. Outputevents holds information on the patients’ bodily outputs, thus the primary information mapped were item codes for these events and the resulting value.
Procedures in the ICU
The procedureevents table is mapped to the MIMIC_Procedure_ICU profile. MIMIC_Procedure_ICU enforces terminology bindings for the procedure code, body site and category. The primary information mapped from procedureevents includes timing, procedure codes, status and the procedure location.
Billing Resources
Condition
The condition resource records conditions, diagnoses, and problems.
The diagnosis_icd table was mapped to the profile MIMIC_Condition, which is based on the US Core Condition profile. The MIMIC_Condition profile contains terminology bindings for the diagnosis code and condition category. The primary information mapped from diagnosis_icd was ICD codes.
Procedure
The Procedure resource records an action that is performed on a patient. The procedure profile is based on the US Core Procedure profile.
The procedures_icd table is mapped to the MIMIC_Procedure profile. MIMIC_Procedure adds terminology bindings to the code element, to ensure proper ICD mapping. The primary information pulled in from procedures_icd is the timing and ICD code for the procedure.
Usage Notes
An open source repository, MIMIC-FHIR, was created to store all the components needed to generate and use the mimic-fhir resources [8]. The repository allows for community discussion and collaboration on mimic-fhir. An archived version of the scripts for building the MIMIC-IV-on-FHIR demo dataset are also archived in Zenodo [9].
The mimic-fhir NDJSON's can be taken and loaded into any FHIR server. A jupyter notebook was developed to walk through the loading and usage of the mimic-fhir resources with the Pathling FHIR server [10, 11]. Pathling was used to demo the mimic-fhir resources due to its simple ndjson loading and optimized analytic operations.
Release Notes
v2.0: MIMIC-IV Demo in FHIR v2.0 is the first release of MIMIC-IV Demo in FHIR.
Ethics
This project builds upon the work of MIMIC-IV v2.0. MIMIC-IV is a collection of deidentified patient data from the Boston Isabel Deaconess Medical Center. MIMIC-IV-on-FHIR approval is based on the original MIMIC-IV work being deidentified and approved for credentialed distribution.
Acknowledgements
The authors would like to thank those behind MIMIC-IV for making the data available, the FHIR community for support in answering questions, and The Hospital for Sick Children for their financial support.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Department of Health & Social Care, G7 Open Standards and Interoperability3–20 (2021). Crown copyright. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1045267/G7-open-standards-final-report.pdf
- Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall, D., Duffett, C., Dube, K., Gallagher, T., & McLachlan, S. (2017). Synthea: An approach, method, and software mechanism for generating synthetic patients and the Synthetic Electronic Health Care Record. Journal of the American Medical Informatics Association, 25(7), 921–921. https://doi.org/10.1093/jamia/ocx147
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98
- Kallfelz, M., Tsvetkova, A., Pollard, T., Kwong, M., Lipori, G., Huser, V., Osborn, J., Hao, S., & Williams, A. (2021). MIMIC-IV demo data in the OMOP Common Data Model (version 0.9). PhysioNet. https://doi.org/10.13026/p1f5-7x35
- Ververs, S., Ulrich, H., Kock, A.-K., & Ingenerf, J. (2018). Konvertierung von MIMIC-III-Daten zu FHIR. Jahrestagung Der Deutschen Gesellschaft Für Medizinische Informatik, Biometrie Und Epidemiologie E.V. (GDMS). https://doi.org/10.3205/18gmds018
- Ulrich, H., Behrend, P., Wiedekopf, J., Drenkhahn, C., Kock-Schoppenhauer, A.-K., & Ingenerf, J. (2021). Hands on the Medical Informatics Initiative Core Data Set — lessons learned from converting the mimic-IV. Studies in Health Technology and Informatics. https://doi.org/10.3233/shti210549
- US Core HL7 FHIR Implementation Guide. https://www.hl7.org/fhir/us/core/ [Accessed: 6 June 2022]
- MIMIC-IV-on-FHIR Code on GitHub. https://github.com/kind-lab/mimic-fhir [Accessed: 6 June 2022]
- Bennett AM; Johnson AJ. (2022). kind-lab/mimic-fhir: MIMIC-IV-on-FHIR v1.0 (v1.0). Zenodo. https://doi.org/10.5281/zenodo.6547592
- MIMIC-FHIR Tutorial. https://github.com/kind-lab/mimic-fhir/blob/main/tutorial/mimic-fhir-tutorial-pathling.ipynb [Accessed: 6 June 2022]
- Pathling: Advanced FHIR Analytics Server. https://pathling.csiro.au/ [Accessed: 6 June 2022]
Access
Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Open Database License v1.0
Discovery
DOI (version 2.0):
https://doi.org/10.13026/2f5g-dh02
DOI (latest version):
https://doi.org/10.13026/vphg-y548
Topics:
fhir
electronic health records
mimic
Project Website:
http://fhir.mimic.mit.edu
Corresponding Author
Files
Total uncompressed size: 876.2 MB.
Access the files
- Download the ZIP file (116.2 MB)
-
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/mimic-iv-fhir-demo/2.0/
-
Download the files using AWS command line tools:
aws s3 sync s3://physionet-open/mimic-iv-fhir-demo/2.0/ DESTINATION
Name | Size | Modified |
---|---|---|
mimic-fhir | ||
LICENSE.txt (download) | 25.2 KB | 2022-06-06 |
SHA256SUMS.txt (download) | 2.2 KB | 2022-06-07 |