Database Credentialed Access

EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM)

Gloria Hyunjung Kwak Dana Moukheiber Mira Moukheiber Lama Moukheiber Sulaiman Moukheiber Neel Butala Leo Anthony Celi Christina Chen

Published: Feb. 23, 2024. Version: 1.0.0

When using this resource, please cite: (show more options)
Kwak, G. H., Moukheiber, D., Moukheiber, M., Moukheiber, L., Moukheiber, S., Butala, N., Celi, L. A., & Chen, C. (2024). EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM) (version 1.0.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


The EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM) is a structured echocardiogram database derived from 43,472 observational notes obtained during echocardiogram studies conducted in the intensive care unit at the Beth Israel Deaconess Medical Center between 2001 and 2012. The database encompasses various aspects of cardiac structure and function, including cavity size, wall thickness, systolic and diastolic function, valve regurgitation and stenosis, as well as pulmonary pressures. To facilitate extensive data analysis, the clinical notes were transformed into a structured numerical format. Within each echocardiogram report sentence, specific words or phrases were identified to describe abnormal findings, and a severity staging system using numeric categories was established. This large publicly-accessible database of structured echocardiogram data holds significant potential as a tool to investigate cardiovascular disease in the intensive care unit.


A echocardiogram is a rapid, ultrasound imaging procedure to evaluate the structure and function of the heart. It has become a widely used and versatile method for the diagnosis and management of critically ill patients [1–10]. During a transthoracic echocardiogram, a sonographer skilfully maneuvers a transducer across the chest to deliver high-frequency sound waves towards the heart. These waves are reflected back to the transducer and transmitted to a computer, which can generate moving images of the heart as well as Doppler waveforms that can be used for quantification of velocities and pressures. Along with clinical presentation and laboratory values, echocardiograms can provide valuable information on critically ill patients and guide treatment strategies [1–5, 11]. Echocardiograms can enable clinicians to detect the cause of hemodynamic instability, prompt further investigations such as coronary angiography for suspected myocardial infarction, and guide the selection of medications for hemodynamic support and other therapeutic interventions [6,7].

The use of echocardiograms has increased dramatically in clinical medicine over the past two decades as this technology has evolved [5,12]. This, in turn, has led to the creation of large databases of echocardiograms at many institutions, many of which have been mobilized for research [12]. However, the systematic use of echocardiogram databases for knowledge generation becomes challenging due to the oftentimes unstructured format of the written echocardiogram reports.

In this study, we describe the process of transforming 45,794 echocardiogram reports from intensive care unit (ICU) patients into a structured numerical format, facilitating large-scale data analysis. The scope of echocardiogram reports include all available documents, incorporating those from inpatient stays as well as those created before and after hospitalization. The numerical format includes observations of the cardiac wall, cavity size, valves, and systolic and diastolic function.

The EchoNotes Structured Database derived from Medical Information Mart for Intensive Care (MIMIC)-III (ECHO-NOTE2NUM) described in this manuscript can be used to answer a variety of important clinical questions regarding cardiac structure and function as well as critical care management in the intensive care unit. For example, changes in left ventricular systolic function are known to occur in times of critical illness, and this database can be used to characterize trajectories of such function as well as its association with clinical events and therapeutic strategies. Additionally, echocardiogram results can help determine the cause of hemodynamic instability during critical illness, such as hypovolemia or cardiogenic shock. Incorporating ECHO-NOTE2NUM data into models can augment future comparative effectiveness research into the efficacy and safety of various medications or interventions to treat critical illness using the broader MIMIC-III database.


The MIMIC-III database is a large de-identified open access ICU database with over 50,000 admissions to the Beth Israel Deaconess Medical Center in Boston, MA, USA [13]. The data was anonymized according to the Health Insurance Portability and Accountability Act (HIPAA) standard using structured data cleansing and date shifting. In particular, the dates were shifted with random offsets for each individual patient, and the date of birth of patients over 89 masks their actual age and appears in the database as age over 300. Protected health information, including physician name and date of procedure, has been removed from clinical notes [13]. In brief, this open source database contains a wealth of information regarding patient stays ranging from vital signs and laboratory results, including over 40,000 inpatient and outpatient echocardiogram reports allowing for data reproducibility and generalizability [14]. The database includes patients from multiple types of ICUs (medical, cardiac, and surgical) and spans over 11 years. The MIMIC-III database source files were downloaded from PhysioNet as detailed by the MIMIC website and analyzed using PostgreSQL 9.4 and Python 3.6.

To create the dictionary schema of the hierarchical numerical coding system, regular expressions were used to break down each report into sections. Sections include left ventricle (left ventricular cavity, left ventricular diastolic function, left ventricular systolic function, left ventricular wall thickness), right ventricle (right ventricular cavity, right ventricular systolic function, right ventricular wall thickness, right ventricular volume overload, right ventricular pressure overload), aortic valve (aortic valve regurgitation and stenosis), mitral valve (mitral valve regurgitation and stenosis), left atrium (left atrial cavity), right atrium/interatrial septum (right atrium dilation and pressure), and tricuspid valve function (tricuspid valve regurgitation, stenosis, and pulmonary artery hypertension). A unique dictionary of observation notes was created for each section, and the sentences were analyzed to identify words or phrases that describe anomalous findings. A hierarchical numerical coding system for evaluability, abnormality, and severity levels was developed in consultation with clinical experts.

Two schemas were used to present data categories. Schema 1 included valvular evaluation, chamber size, and function evaluation, while Schema 2 included other right-sided parameters without any severity evaluation. Schema 1 covered AV regurgitation, AV stenosis, LA cavity, LV cavity, LV diastolic, LV systolic, LV wall, MV regurgitation, MV stenosis, RA pressure, RV cavity, RV systolic, RV wall, TV pulmonary hypertension, TV regurgitation, and TV stenosis. Schema 2 covered RA dilation, RV pressure overload, and RV volume overload. Through Schema 1 and 2, reports were assigned a category of -3 if the note stated that the study was inadequate to evaluate the section. A category of 0 was assigned if the note stated that the study adequately evaluated the function, but no abnormality was present. A category of -2 was assigned if the note stated that the function was abnormal but not otherwise specified in severity. Severity levels were categorized as mild, moderate, and severe, with levels 1, 2, and 3, respectively. However, for LV cavity size, LV systolic function, and RV cavity size, a category of -1 indicated hyperdynamic LV systolic function or small cavity size for LV and RV.

Using the dictionary schema, any sentences that contained certain descriptions were annotated with corresponding categories. The study provided an example echocardiogram report in Table 1 and an overview of the available data categories in Figure 1 and Table 4, which included the two schemas. For example, to determine the severity of left ventricular systolic dysfunction, we first utilized Schema 1 to evaluate whether the function was evaluated or indeterminate, the latter of which was coded as -3. If the function was evaluated and found to be normal, it was coded as 0. If there were simply an indication in the free text that the LV function was abnormal but not otherwise specified in severity, it was coded as -2. If LV function was abnormal (depressed) and the abnormality is quantifiable, the value for this variable would be 1 (mildly depressed), 2 (moderately depressed), or 3 (severely depressed). Hyperdynamic function was coded as -1. For right ventricle volume overload and dilation, the hierarchical numerical coding system was applied with Schema 2 as follows; -3 (not specifically mentioned, difficult to view), 0 (not overload/dilated), 1 (overload/dilated). Pressures were coded as -3 (specifically mentioned as indeterminate or difficult to view), 0 (normal), or 1 (abnormal). If there was no mention of function or dysfunction for a given concept, the field was left null. However, if there were two statements of different severity, then the highest number was recorded (e.g. Moderate AS (area 1.0-1.2cm2), Severe AS (area 0.8-1.0 cm2.) appearing in the same report: Severe 3). If there were conflicting statements regarding the level of function (ex. "left ventricular systolic function is normal" and "left ventricular ejection fraction is mildly depressed" appearing in the same report), it was labeled as -50.

The hierarchical categorization system can be used to address missing data or data with varying levels of specificity in echocardiogram reports. This system enables researchers to utilize this variable without losing valuable data points that would otherwise be excluded. In particular, if a researcher intends to examine whether LV function is normal or abnormal, they can compare the category of 0 (normal) with the categories of -2, -1, 1, 2, or 3 (abnormal). Alternatively, if a researcher is interested in studying only severe LV dysfunction, they can identify this dataset by selecting only echocardiogram reports where the LV is categorized as 3.

The approval for this project, derived from the MIMIC-III, was granted by the Institutional Review Boards of Beth Israel Deaconess Medical Center and the Massachusetts Institute of Technology. The need for individual patient consent was waived, as the project had no bearing on clinical care, and all protected health information was de-identified through the MIMIC-III.

Data Description

The ECHO-NOTE2NUM is a structured echocardiogram note database derived from MIMIC-III. The EchoReports.csv file is available in the ECHO-NOTE2NUM project and describes the available structured echocardiogram note data derived from echocardiogram observation note with csv format. The column headers are outlined below:

Patient Characteristics:
  • subject_id: de-identified patient id
  • hadm_id: hospital admission id
  • status: patient status at the time of echocardiogram, distinguishing between being an inpatient or outpatient
  • height (in): measurement of patient height in inches
  • weight (lb): measurement of patient weight in pounds
  • heart rate (bpm): measurement of heart rate in beats per minute
  • sbp (mmHg): systolic blood pressure
  • dbp (mmHg): diastolic blood pressure
  • BSA (m2): body surface area in square meters
Clinical Note Details:
  • row_id: echocardiogram report id
  • category: clinical note type
  • text: raw clinical note text
  • chartdate: clinical note charted date
  • date_time: clinical note charted date and time
Echocardiogram Measurements:
  • LA_cavity: left atrium cavity
  • RA_dilated: right atrium dilated
  • LV_systolic: left ventricle systolic
  • LV_cavity: left ventricle cavity
  • LV_wall: left ventricle wall
  • RV_cavity: right ventricle cavity
  • RV_systolic: right ventricle systolic
  • AV_stenosis: aortic valve stenosis
  • MV_stenosis: mitral valve stenosis
  • TV_regurgitation: tricuspid valve regurgitation
  • TV_stenosis: tricuspid valve stenosis
  • TV_pulm_htn: tricuspid valve pulmonary hypertension
  • AV_regurgitation: aortic valve regurgitation
  • MV_regurgitation: mitral valve regurgitation
  • RA_pressure: right atrium pressure
  • LV_diastolic: left ventricle diastolic
  • RV_volume_overload: right ventricle volume overload
  • RV_wall: right ventricle wall
  • RV_pressure_overload: right ventricle pressure overload

From MIMIC-III, 45,794 echocardiogram database records, of which 43,472 contained valid echocardiogram reports, were identified. Out of the 38,597 patients in MIMIC-III, 21,572 patients had valid echocardiogram reports. There were 19,675 patients who had an echocardiogram report during a hospitalization that resulted in at least one ICU stay. This is the patient cohort we will focus on in this paper. These patients had 22,840 distinct hospital admissions and they had a total of 31,973 valid echocardiogram reports during one of these hospital admissions. They had 11,499 valid echocardiogram reports that did not have an associated hospital admission, which means they were done either as an outpatient or as an inpatient hospitalization that did not result in an ICU stay. 

Usage Notes

The hierarchical numerical categorization of echocardiogram reports in the MIMIC-III database simplifies analyses of cardiovascular function in relation to ICU disease course and outcomes. Researchers seeking access to MIMIC-III data must formally request access through the process documented on the MIMIC website [15]. The documentation for the noteevents table containing all notes, including the echo notes is available on the MIMIC website [16].

The ECHO-NOTE2NUM database can be used to investigate cardiac structure and function in the context of critical illness. This database can be used to characterize trajectories of such left ventricular, valvular disease, and volume status throughout a patient's clinical course in the ICU as well as their association with outcomes. Additionally, through integration with the broader MIMIC database, this database can investigate how clinical events, such as acute changes in volume or vascular tone, can affect cardiac function. Furthermore, this database enables investigation into how critical illness therapies, such as vasoactive medications or antibiotics, may affect cardiac structure and function.

Moreover, the data can be used to enhance comparative effectiveness studies of critical illness therapeutic strategies in the MIMIC-III dataset. Critically ill patients in the ICU represent a heterogenous group of patients, which may often be difficult to segment into homogenous groups based on the mechanism of illness. An echocardiogram can help identify causes of hemodynamic instability and segment patients into more homogenous groups based on similar etiology, which can then be used to study the effectiveness of therapeutic interventions. For example, researchers can determine whether cardiac structure and function from patients with sepsis or acute kidney injury modifies the disease course or potentiates various therapies. Data extracted from an individual echocardiogram can be included in predictive models alongside other clinical and demographic features to enhance predictive capabilities on clinical outcomes, such as mortality or length of stay. Additionally, researchers can leverage echocardiograms taken at different time points, including pre- and post-surgery, to evaluate patient progress and cardiac recovery over time.

To the best of our knowledge, this is the first large free and publicly available database containing echocardiogram reports of ICU patients, linked with complete electronic health records during the ICU admission as well as survival outcome after discharge from the hospital. We hope that this rich database of information will lead to novel discoveries involving cardiac function among critically-ill patients.


The approval for this project, derived from the MIMIC-III, was granted by the Institutional Review Boards of Beth Israel Deaconess Medical Center  and the Massachusetts Institute of Technology. Patient consent was waived, as the project had no bearing on clinical care, and all protected health information was de-identified.


L.C., D.M., and L.M. are supported by grant NIH-R01-EB017205 from the National Institute of Health. D.M. and L.M. are supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH-R01-EB030362. D.M. is supported by NIH National Library of Medicine under 75N97020C00013, and Massachusetts Life Sciences Center. N.B. is supported by the Boettcher Foundation Webb-Waring Biomedical Research Grant. The authors would like to thank Benjamin Moody, Simon Vistisen, Wei-Hung Weng, Alistair Johnson, and Tom Pollard for their valuable feedback.

Conflicts of Interest

N.B. receives consulting fees and has ownership interest in HiLabs, outside the published work. All other authors declare no competing interests.


  1. Vieillard-Baron, A., Slama, M., Cholley, B., Janvier, G. & Vignon, P. “Echocardiography in the intensive care unit: from evolution to revolution?”. Intensive Care Med. 2008;34:243–249.
  2. Jensen, M., Sloth, E., Larsen, K. M. & Schmidt, M. B. “Transthoracic echocardiography for cardiopulmonary monitoring in intensive care”. Eur J Anaesthesiol. 2004;21:700–707.
  3. Noritomi, D. T. et al. “Echocardiography for hemodynamic evaluation in the intensive care unit”. Shock. 2010;34:59–62.
  4. Benjamin, E. et al. “Goal-directed transesophageal echocardiography performed by intensivists to assess left ventricular function: comparison with pulmonary artery catheterization”. J Cardiothorac Vasc Anesth. 1998;12:10–15.
  5. Wells, Q. S., Farber-Eger, E. & Crawford, D. C. “Extraction of echocardiographic data from the electronic medical record is a rapid and efficient method for study of cardiac structure and function”. J Clin Bioinforma. 2014;4:12.
  6. Vieillard-Baron, A., Prin, S., Chergui, K., Dubourg, O. & Jardin, F. “Hemodynamic instability in sepsis: bedside assessment by doppler echocardiography”. Am J Respir Crit Care Med. 2003;168:1270–1276.
  7. Griffee, M. J., Merkel, M. J. & Wei, K. S. “The role of echocardiography in hemodynamic assessment of septic shock”. Crit Care Clin. 2010;26:365–382.
  8. McLean, A. S. “Echocardiography in shock management”. Crit Care. 2016;20:1–10.
  9. Vieillard-Baron, A. et al. “A decade of progress in critical care echocardiography: a narrative review”. Intensive Care Med. 2019;45:770–788.
  10. Vieillard-Baron, A. “Septic cardiomyopathy”. Ann Intensive Care. 2011;1:1–7.
  11. Black, I. W., Hopkins, A. P., Lee, L. C. & Walsh, W. F. “Left atrial spontaneous echo contrast: a clinical and echocardiographic analysis”. J Am Coll Cardiol. 1991;18:398–404.
  12. Strange, G. et al. “The national echocardiography database australia (neda): rationale and methodology”. Am Heart Journal. 2018;204:186–189.
  13. Johnson, A. E. et al. “Mimic-iii, a freely accessible critical care database”. Sci Data. 2016;3:1–9.
  14. Alberto IR, Alberto NR, Ghosh AK, Jain B, Jayakumar S, Martinez-Martin N, McCague N, Moukheiber D, Moukheiber L, Moukheiber M, Moukheiber S. The impact of commercial health datasets on medical research and health-care algorithms. The Lancet Digital Health. 2023 May 1;5(5):e288-94.
  15. MIMIC data access: [Accessed 2/22/2024]
  16. MIMIC website documentation [Accessed 2/22/2024]


Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research


DOI (version 1.0.0):

DOI (latest version):

Corresponding Author
You must be logged in to view the contact information.