Database Credentialed Access
MIMIC-IV-ECHO: Echocardiogram Matched Subset
Brian Gow , Tom Pollard , Nathaniel Greenbaum , Benjamin Moody , Ahram Han , Jonathan W Waks , Alistair Johnson , Elizabeth Herbst , Parastou Eslami , Ashish Chaudhari , Tanner Carbonati , Seth Berkowitz , Roger Mark , Steven Horng
Published: March 10, 2026. Version: 1.0
When using this resource, please cite:
Gow, B., Pollard, T., Greenbaum, N., Moody, B., Han, A., Waks, J. W., Johnson, A., Herbst, E., Eslami, P., Chaudhari, A., Carbonati, T., Berkowitz, S., Mark, R., & Horng, S. (2026). MIMIC-IV-ECHO: Echocardiogram Matched Subset (version 1.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/nrjh-5r77
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
The MIMIC-IV-ECHO module contains structured echocardiographic measurements and DICOM files from echocardiography exams.
Structured measurements are available from 206,488 echocardiogram studies (including 179,928 transthoracic, 16,389 stress, and 10,171 transesophageal echocardiograms). These are from 91,372 unique patients who are in the MIMIC-IV Clinical Database and who had echocardiograms conducted between 2008 – 2022. The dataset encompasses a broad range of cardiologist-recorded echocardiographic measurements, including chamber sizes and volumes, systolic and diastolic function, valvular morphology and function, aortic root and great vessel dimensions, and Doppler-derived hemodynamics. Unstructured measurements are present in the cardiologist’s free-text reports, which will be made available later under the MIMIC-IV-Note module.
The echocardiogram DICOMs contain more than 500,000 DICOMs across 7,243 studies from 4,579 distinct patients. This subset contains transthoracic echocardiograms for patients who appear in the MIMIC-IV Clinical Database and were admitted between 2017–2019. A given study consists of numerous sequences of images, with each sequence representing a particular view of the patient's heart.
Records in MIMIC-IV-ECHO are matched to the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules. We have also provided information for linking the DICOMs to the structured measurements and notes where available.
Background
An echocardiogram uses high-frequency sound waves (ultrasound) to take pictures of the heart [1], revealing information about the heart's structure and how it is functioning. Echocardiography is used to diagnose, monitor, and assess treatment results in patients who have or are suspected to have heart problems.
An echocardiogram study typically contains multiple views and sometimes uses multiple ultrasound techniques. A change in the position and angle of the ultrasound probe relative to the heart produces a different view [2]. Different views reveal information about different areas of the heart. All views of the DICOM dataset provided here are taken with the probe at the patient's chest (i.e., TTE or Transthoracic Echocardiography). However, the structured measurement table also includes data from transesophageal echocardiography (TEE), which provides high-resolution images of posterior cardiac structures via an esophageal probe, and stress echocardiography, which evaluates myocardial contractile response to exercise or pharmacologic stress. Common types of echocardiography include 2-D, Doppler, and 3-D [2]. In a 2-D echocardiogram, real-time cross-sectional images of the heart are produced. Doppler echocardiography is an extension of 2-D echocardiography with information on blood flow velocities and directions. 3-D echocardiography produces three-dimensional images of the heart.
While echocardiograms are an extremely valuable tool for the management of heart problems, they typically only comprise a small part of understanding the overall condition of a patient at the hospital. Echocardiograms are most informative when combined with a broader set of data such as: patient demographics, diagnoses, medications, laboratory tests, and electrocardiograms. This broader set of information is shared as part of the MIMIC-IV Database [3].
Methods
Echocardiograms and reports containing structured measurements are collected at the Beth Israel Deaconess Medical Center (BIDMC). Three types of datasets are associated with echocardiograms – structured measurement reports from the echocardiography machine, cardiologist reports, and echocardiogram DICOM images. The structured measurements and DICOM images are presented here.
We provide unique identifiers, such as subject_id, that allow studies to be connected to other information in the MIMIC-IV Database. All of the information is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements.
Electronic Health Record
When the echocardiograms are recorded, the machine is populated with the patient's demographic details and their medical record number (MRN). The MRN was used to match records to the corresponding patient in the MIMIC-IV Clinical Database. Dates were shifted to obscure the actual date, but relative date shifts are retained for a given patient.
Timestamps for events in the MIMIC-IV Clinical Database, such as drug administration, are aligned with the timestamps in MIMIC-IV-ECHO. However, some of the echocardiograms provided here were collected outside of Emergency Department (ED) or Intensive Care Unit (ICU) visits at the hospital. Since the MIMIC-IV Clinical Database is composed solely of ED and ICU data, the echocardiogram timestamp can occur before or after a visit in the clinical database.
Structured Echocardiographic Measurements
During echocardiography, images are acquired and analyzed through a combination of clinician input and machine algorithms. The level of automation varies by measurement type: some require direct clinician-performed measurements (e.g., wall thickness), while others involve clinician-guided tracing with algorithmic calculation (e.g., left ventricular ejection fraction (LVEF) from traced ventricular borders). All measurements are reviewed and, if necessary, manually corrected by the clinician before being finalized. We refer to these algorithm-assisted, clinician-verified quantitative parameters as structured echocardiographic measurements.
All echocardiographic structured-measurement records available in the BIDMC echocardiography systems between 2008 and 2022 were included. We filtered echocardiography records to those belonging to patients present in the MIMIC-IV Clinical Database by matching their MRNs. As a result, the structured measurement dataset contains only patients included in the MIMIC-IV Clinical Database. The matched MRNs were then converted to the de-identified subject_id using the same deterministic MRN-to-subject_id mapping applied across all MIMIC-IV modules. No additional filters were applied based on location or encounter type, so the dataset includes echocardiograms performed in the ICU and ED as well as studies obtained outside these settings. In rare cases (~0.01%), duplicate structured-measurement records existed for a single echocardiography study. These were consolidated by keeping the record containing the most complete set of measurements.
After linkage, the structured measurement reports underwent standardized de-identification. All identifiers and timestamps were shifted or replaced according to the MIMIC-IV de-identification scheme. In this process, the original measurement timestamp was shifted to produce the de-identified measurement_datetime, and each study was assigned a unique measurement_id. Measurements containing residual PHI were removed or modified, and the final de-identified reports were validated using a rule-based procedure to ensure that no PHI remained.
Echocardiogram DICOM images
Each echocardiogram consists of a sequence of images representing specific views of the heart, along with metadata. The echocardiogram DICOM images originate from a subset of available echocardiograms from 2017 – 2019, which were collected on General Electric Vivid E90, Vivid E95, and Vivid S7 machines. Only a subset of the studies from this period was available for release, and additional DICOM studies may be included in future updates.
Text embedded in the images was identified by using Optical Character Recognition. When images were found to contain PHI, the entire study was omitted from the dataset. This occurred in a very small number of studies.
We also scrubbed all of the PHI from the electrocardiogram metadata:
- IDs: Source identifiers used by the hospital were shifted to a random value. In particular,
mrnwas translated tosubject_id, consistent with MIMIC-IV. A uniquestudy_idwas generated for each DICOM record. - UIDs: Unique identifiers (UIDs) provide the capability to identify a wide variety of items, typically providing a guarantee of uniqueness across countries, sites, vendors, equipment, and formats. Some UIDs, such as the Study Instance UID, contain sensitive information. In such cases, we deterministically shifted the UID by using a UID root assigned to our laboratory (
1.2.840.113554.6.1.) followed by a three-digit value specific to the variable and a randomized value. - Dates: Raw dates were either shifted (consistent with all date shifts in MIMIC-IV for a given subject) or removed (in the case of Patient Birth Date).
- Names: Person names were either deterministically replaced with a randomized string or removed (in the case of Referring Physician Name).
- Private tags: Information recorded in private tags was removed.
Cardiologist Reports
Cardiologist interpretations of the echocardiograms will be made available at a later date in the MIMIC-IV Note module [4]. These reports were de-identified using a hybrid rule-based and machine learning approach [5-8], similar to that used for other MIMIC reports. Each instance of PHI was replaced by three underscores.
Data Description
Structured Echocardiographic Measurements
Structured echocardiographic measurements from 206,488 studies - including 179,928 transthoracic, 16,389 stress, and 10,171 transesophageal echocardiograms - are provided in the structured_measurement.csv file. These data are derived from 91,372 unique patients in the MIMIC-IV Clinical Database who underwent echocardiography during the study period. A subset (about 5%) was reserved as a hidden test set for future use. This hidden subset was defined at the patient level following the MIMIC-IV train/test split, in which the training set corresponds to the publicly released MIMIC-IV data. Accordingly, all structured measurements for test-set patients were withheld, while all measurements for training-set (publicly available) patients were retained.
The structured measurement file includes the following variables:
subject_id: An identifier for the subject that is consistent across the MIMIC-IV database.measurement_id: An identifier linking each record to the corresponding echocardiographic study.measurement_datetime: The de-identified date and time when the diagnostic echocardiogram was performed.test_type: Specifies the modality of the echocardiographic study, categorized as transthoracic echocardiogram (TTE), transesophageal echocardiogram (TEE), or stress echocardiogram.measurement: An identifier for the specific echocardiographic measurement.measurement_description: A descriptive label for the measurement.result: The recorded measurement value.unit: The unit of measurement for the result.
Approximately 180 to 230 unique structured measurement variables are recorded, depending on the echocardiographic test type. Notably, during the study period, a transition in the echocardiography laboratory system occurred, which introduced variability in the naming and availability of measurement variables. As a result, even within the same modality (e.g., TTE), different sets of variables may be present depending on the system in use. For example, measurement variables related to left ventricular ejection fraction (LVEF) included lvef and lvef_upper in one system, while the other system captured additional structured variables such as biplane_lvef and lvef_3d. Missing measurement values were retained as-is and not removed, in order to preserve the original structure and availability of the data.
Echocardiogram DICOMs
Approximately 525,000 echocardiogram DICOMs across 7,243 transthoracic echocardiogram studies from 4,579 patients are included in the MIMIC-IV-ECHO module. Around 5% of the available echocardiograms were withheld at the patient level, consistent with the MIMIC-IV train/test split. Patients in this module are linked to the MIMIC-IV Clinical Database on subject_id. Many but not all of the echocardiograms overlap with a hospital or emergency department stay.
In addition, our source data does not include an identifier that directly links a structured measurement to a given DICOM. To help address this issue, we have provided a derived table (detailed below) that indicates which reports occur within two days of a given study. Approximately 99% of the echocardiogram DICOMs are linked to a structured measurement record in the derived table.
The echocardiograms are stored as DICOM (.dcm) files. DICOM or Digital Imaging and Communications in Medicine defines standards for the storage of medical images and related information [9]. Each DICOM file contains a sequence of images for a particular view of the heart.
Echocardiograms are grouped into subdirectories based on subject_id. Each DICOM record path follows the pattern files/pNN/pXXXXXXXX/sZZZZZZZZ/ZZZZZZZZ_VVVV, where:
NNis the first two characters of thesubject_id,XXXXXXXXis thesubject_id,ZZZZZZZZis thestudy_id, andVVVVis the view number.
An example of the file structure is as follows:
files
├── p10
| └── p10690270
| ├── s95240362
| │ ├── 95240362_0004.dcm
| │ .
| │ └── 95240362_0093.dcm
| └── s90045402
| ├── 90045402_0001.dcm
| .
| └── 90045402_0088.dcm
└── p19
└── p19425623
└── s90267113
├── 90267113_0001.dcm
.
└── 90267113_0088.dcm
Here we show a subject under the p10 directory and another under the p19 directory. Subject p10690270 has two studies. The first study, s95240362, has 90 DICOM files with view numbers between 4 and 93. The second study, s90045402, has 83 DICOM files with view numbers between 1 and 88. The subject under the p19 directory, p19425623, has only one study s90267113. We find 83 DICOM files under this study with view numbers between 1 and 88.
A number of open-source programs are available for viewing DICOMs such as Miele-LXIV(Mac), MicroDicom(Windows), and ImageMagick(Linux, Mac, Windows). We also provide example code for loading DICOMs into Python in the Usage Notes section below.
Summary Tables
The echo-record-list.csv provides the path to each DICOM file (dicom_filepath) along with the acquisition_datetime of the DICOM (date and time that the acquisition started for a given view), the associated study_id and the subject's MIMIC-IV subject_id.
The echo-study-list.csv provides a link between the DICOM study_id and its associated structured measurement, as well as the cardiologist's report, where available. When structured measurements are available within two days of the study_datetime of the DICOM study, the corresponding measurement_id and measurement_datetime are provided, enabling linkage to quantitative parameters in the structured_measurement.csv file. Similarly, when a cardiologist's report is available within two days of the DICOM study, the corresponding note_id, note_seq, and note_charttime are included, allowing linkage to the narrative text in the MIMIC-IV Note module.
These summary tables are also provided on BigQuery [10].
Cardiologist Reports
The cardiologist reports will be released under the MIMIC-IV Note module at a later date.
Usage Notes
The MIMIC-IV-ECHO module augments existing information in MIMIC-IV, providing an important new resource - particularly for cardiac-related research.
Loading a DICOM in Python
The code snippet below shows how to use the pydicom library to load a DICOM into python, read its metadata and plot an image.
import matplotlib.pyplot as plt
import pydicom
from pydicom.pixel_data_handlers import convert_color_space
file_path = '/files/p10/p10690270/s95240362/95240362_0004.dcm'
# read in the DICOM with the pydicom module
dicom_data = pydicom.dcmread(file_path)
# print the DICOM metadata
for element in dicom_data:
print(element)
# note the value for Photometric Interpretation that was printed, it should show:
# (0028, 0004) Photometric Interpretation CS: 'YBR_FULL_422'
# we need to convert from YBR_FULL_422 to RGB to display the image properly
images_rgb = convert_color_space(dicom_data.pixel_array, "YBR_FULL_422", "RGB", per_frame=True)
# plot the first frame/image
plt.imshow(images_rgb[0])
plt.show()
Linking to MIMIC-IV
In the example below, we show how a patient in MIMIC-IV-ECHO with subject_id = p10690270 can be linked to admission information in the MIMIC-IV Clinical Database. Running the following SQL command in Google BigQuery gives us the dates of available echocardiograms for the patient:
SELECT DISTINCT study_id, dicom_datetime
FROM `lcp-consortium.mimic_echo.record_list`
WHERE subject_id = 10690270
| study_id | dicom_datetime |
| 90045402 | 2180-02-08 10:22:25 UTC |
| 95240362 | 2179-07-25 09:15:36 UTC |
Executing the following queries gives us the admission and discharge times for the patient in the MIMIC-IV Clinical Database:
SELECT admittime, dischtime
FROM `physionet-data.mimiciv_hosp.admissions`
WHERE subject_id = 10690270
| admittime | dischtime |
| 2179-07-24T20:57:00 | 2179-07-27T17:52:00 |
| 2176-07-28T23:37:00 | 2176-07-30T17:25:00 |
SELECT intime, outtime
FROM `physionet-data.mimiciv_ed.edstays`
WHERE subject_id = 10690270
| intime | outtime |
| 2176-07-28T18:33:00 | 2176-07-29T01:44:00 |
The results show two hospital admissions and one emergency department stay for the patient. Our echocardiogram with study_id = 95240362, is associated with the admission in the (deidentified) year of 2179. In 2176 the patient was seen in the emergency department and admitted to the hospital. This visit does not have an associated echocardiogram and likely occurred prior to the date range for inclusion in MIMIC-IV-ECHO.
Our echocardiogram with study_id=90045402 does not appear to be associated with a clinical database visit. Some of the echocardiograms in this dataset were collected outside of ED or ICU visits. Since the MIMIC-IV Clinical Database is currently composed solely of ED and ICU data, the echocardiograms timestamp may occur before or after a visit recorded in the clinical database.
Release Notes
This release contains structured echocardiographic measurements for subjects in MIMIC-IV.
Ethics
The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
Acknowledgements
SH, RM, BG, and TP are funded by the Massachusetts Life Sciences Center, Nov. 30, 2020. NG is supported by National Institutes of Health National Library of Medicine Biomedical Informatics and Data Science Research Training Program under grant number T15LM007092-30. BG, TP, AJ, BM, CF, DM, and RM are supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number R01EB030362.
Conflicts of Interest
The author(s) have no conflicts of interest to declare.
References
- Ashley EA, Niebauer J. Cardiology Explained. London: Remedica; 2004. Chapter 4, Understanding the echocardiogram. Available from: https://www.ncbi.nlm.nih.gov/books/NBK2215/
- Lang RM, Badano LP, Mor-Avi V, Afilalo J, Armstrong A, Ernande L, Flachskampf FA, Foster E, Goldstein SA, Kuznetsova T, Lancellotti P. Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. European Heart Journal-Cardiovascular Imaging. 2015 Mar 1;16(3):233-71.
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98.
- Johnson, A., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV-Note: Deidentified free-text clinical notes (version 2.2). PhysioNet. https://doi.org/10.13026/1n74-ne17.
- Margaret Douglass, Computer-assisted de-identification of free-text nursing notes. Master's Thesis, 2005. MIT.
- Neamatullah, I., Douglass, M.M., Lehman, L.H., Reisner, A., Villarroel, M., Long, W.J., Szolovits, P., Moody, G.B., Mark, R.G., Clifford, G.D. (2007). De-Identification Software Package (version 1.1). PhysioNet. doi:10.13026/C20M3F
- Neamatullah I, Douglass MM, Lehman LH, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC medical informatics and decision making. 2008 Dec;8(1):1-7. doi:10.1186/1472-6947-8-32
- Johnson AEW, Bulgarelli L, Pollard TJ. Deidentification of free-text medical records using pre-trained bidirectional transformers. Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:214-221. doi: 10.1145/3368555.3384455. Epub 2020 Apr 2. PMID: 34350426; PMCID: PMC8330601.
- Digital Imaging and Communications in Medicine About Page. https://www.dicomstandard.org/about/ [Accessed 18 July 2023]
- Documentation about using the Medical Information Mart for Intensive Care (MIMIC) Database with Google BigQuery. https://mimic.mit.edu/docs/gettingstarted/cloud/ [Accessed 21 June 2022]
Parent Projects
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0):
https://doi.org/10.13026/nrjh-5r77
DOI (latest version):
https://doi.org/10.13026/agyq-zp57
Project Views
76
Current Version1512
All VersionsCorresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project