Database Open Access
MIMIC-III Waveform Database Matched Subset
Benjamin Moody , George Moody , Mauricio Villarroel , Gari D. Clifford , Ikaro Silva
Published: April 7, 2020. Version: 1.0
When using this resource, please cite:
(show more options)
Moody, B., Moody, G., Villarroel, M., Clifford, G. D., & Silva, I. (2020). MIMIC-III Waveform Database Matched Subset (version 1.0). PhysioNet. https://doi.org/10.13026/c2294b.
Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
The MIMIC-III Waveform Database Matched Subset contains 22,317 waveform records, and 22,247 numerics records, for 10,282 distinct ICU patients. These recordings typically include digitized signals such as ECG, ABP, respiration, and PPG, as well as periodic measurements such as heart rate, oxygen saturation, and systolic, mean, and diastolic blood pressure.
This database is a subset of the MIMIC-III Waveform Database, representing those records for which the patient has been identified, and their corresponding clinical records are available in the MIMIC-III Clinical Database.
The MIMIC-III Waveform Database contains thousands of recordings of multiple physiologic signals (“waveforms”) and time series of vital signs (“numerics”) collected from bedside patient monitors in adult and neonatal intensive care units (ICUs).
An ICU bedside monitor collects a great deal of data, from which it is possible to infer something about a patient’s physiological state. However, in order to understand how these waveforms are influenced by disease state and treatment, and the extent to which phenomena observed in the waveform can serve as indicators of disease, it is necessary to look at the broader context: patient demographics, diagnoses, medications, lab tests, and other information that is recorded by caregivers in the electronic medical record.
Collecting this broad clinical context is the task of the MIMIC-III Clinical Database, which was created in parallel with the Waveform Database and contains information about many of the same patients. The Matched Subset consists of all of the waveform and numerics recordings for which the corresponding clinical record is also available.
The bedside monitors used for collecting this database were not directly linked to the hospital medical record system. The monitor could be configured to display the patient’s name and medical record number, for ease of identifying patients at the central station, but this was not automatically updated when a patient was admitted or transferred to the ICU. This information was only available when the ICU staff entered it manually into the monitoring system, and since entering this information was not critical to patient care, it was frequently omitted or incomplete. Furthermore, limitations of the data archiving software made it possible to identify the care unit from which a recording originated, but not the precise room or bed number.
As a result, only a subset of the waveform recordings actually contained enough information to reliably identify the patient, and of those, not all overlapped with the time period represented by the MIMIC-III Clinical Database . Using all of the available information, through a process of mostly automated matching with some manual corrections, a total of 22,317 waveform records (34%) and 22,247 numerics records (35%) were found that could be linked to a corresponding patient in the Clinical Database.
For each of those records, a new WFDB header file was created, incorporating the subject ID as well as the surrogate date and time of the recording. Note that the raw signal files (such as
3314767n.dat) and segment header files (such as
3314767_0004.hea) are identical to those in the original numbered records.
The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
All data associated with a particular patient have been placed into a single subdirectory, named according to the patient's MIMIC-III subject_ID. These subdirectories are further divided into ten intermediate-level directories (
The name of each matched waveform record is of the form
matched/pXX/pXXNNNN/pXXNNNN-YYYY-MM-DD-hh-mm, where XXNNNN is the matching MIMIC-III Clinical Database Subject_ID, and YYYY, MM, DD, hh, and mm are the surrogate year, month (01-12), and day (01-31), and the real hour (00-23) and minute (00-59), derived from the starting date and time of day of the record. The surrogate dates match those of the corresponding MIMIC-III Clinical Database records.
In most cases, the waveform record is paired with a numerics record, which has the same name as the associated waveform record, with an
n added to the end.
Frequently there are multiple waveform and numerics record pairs associated with a given clinical record; all of them will appear in the same subdirectory in such a case, and their names will indicate their chronologic sequence. For example, MIMIC-III Clinical Database record
p000079 has been matched with two waveform and numerics record pairs, named:
mimic3wdb/matched record is also an undated
mimic3wdb record (i.e., it also belongs to the full MIMIC-III Waveform Database). Only the surrogate-dated
mimic3wdb/matched header (
.hea) files are unique to the Matched Subset; the others, with names of the form
3*.dat, are copies of the like-named files in the full database.
The following example illustrates the organization of the database:
- Intermediate directory p04 contains all records with names that begin with
p04(patients with a subject_id between 40000 and 49999.)
- All files associated with patient 44083 are contained within the directory p04/p044083. This directory contains two waveform records (p044083-2112-05-04-19-50 and p044083-2112-05-23-12-22) and two corresponding numerics records (p044083-2112-05-04-19-50n and p044083-2112-05-23-12-22n), recorded from two separate ICU stays.
- The master waveform header file for the first stay (p044083-2112-05-04-19-50.hea) indicates that the record is 20342033 sample intervals (about 45 hours) in length, and begins at 19:50 on May 4, 2112. This date, as with all dates in MIMIC-III, has been anonymized by shifting it by a random number of days into the future. See header(5) in the WFDB Applications Guide for more information about the format of this file.
- This waveform record consists of 41 segments (3314767_0001 through to 3314767_0041), as indicated by the master header file. The layout header file (3314767_layout.hea) indicates that four ECG signals (II, AVR, V, and MCL) were recorded, along with a respiration signal, photoplethysmogram, and arterial blood pressure. Not all of these signals are available simultaneously.
- The header file for segment number 4 (3314767_0004.hea) shows us that during this segment, five signals are available: three ECG leads (II, V, and AVR), a respiration signal (RESP), and a PPG signal (PLETH).
- The numerics header file (p044083-2112-05-04-19-50n.hea) shows us that a variety of measurements were recorded, including heart rate, invasive and non-invasive blood pressure, respiratory rate, ST segment elevation, oxygen saturation, and cardiac rhythm statistics. Just as with waveforms, not all of these measurements are available at all times.
Referring to the MIMIC-III Clinical Database Demo, we can see from the PATIENTS table that this patient was male, and his anonymized date of birth was November 15, 2057 (making him 54 years old at the time of this ICU stay):
|44083||M||2057-11-15 00:00:00||2114-02-20 00:00:00|
The ICUSTAYS table shows us that he was admitted once to the SICU and twice to the CCU:
|44083||125157||265615||SICU||2112-05-04 19:03:39||2112-05-06 17:21:01|
|44083||131048||282640||CCU||2112-05-23 12:32:06||2112-05-25 14:59:50|
|44083||198330||286428||CCU||2112-05-29 02:01:33||2112-06-01 16:50:40|
The first of these admissions corresponds to the waveform record above, as indicated by the date (2112-05-04). Note that the starting and ending date and time of the waveform record will not always match the precise admission or discharge time.
The hadm_id (125157) and icustay_id (265615) are linked to other tables in MIMIC-III that provide further information about this particular ICU stay, such as vital signs, laboratory tests, medications, and diagnoses.
This database is a subset of version 1.0 of the MIMIC-III Waveform Database. It also represents a superset of the records in the previously-released MIMIC-II Waveform Database Matched Subset. However, it uses a different directory structure (see Data Description above), as well as different subject IDs and surrogate dates. This version corresponds to version 1.4 of the MIMIC-III Clinical Database.
We wish to thank Philips Healthcare, as well as the Beth Israel Deaconess Medical Center, for their invaluable support in making this project possible.
Many people have contributed to this project over the past 18 years, and it would be impossible to list them all. In particular, we would like to acknowledge Michael Craig, Tin Kyaw, and Mohammed Saeed, for their efforts in collecting and organizing the original MIMIC-II Waveform Database, upon which this database is based.
Conflicts of Interest
The authors have no conflicts of interests to declare.
- Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035. https://dx.doi.org/10.1038/sdata.2016.35
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Open Database License v1.0
Total uncompressed size: 2.4 TB.
Access the files
- Access the files using the Google Cloud Storage Browser here. Login with a Google account is required.
Access the data using the Google Cloud command line tools (please refer to the gsutil
documentation for guidance):
gsutil -m -u YOUR_PROJECT_ID cp -r gs://mimic3wdb-matched-1.0.physionet.org DESTINATION
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/mimic3wdb-matched/1.0/