Database Open Access
MIMIC-IV Waveform Database
Benjamin Moody , Sicheng Hao , Brian Gow , Tom Pollard , Wei Zong , Roger Mark
Published: July 10, 2022. Version: 0.1.0
Announcing the MIMIC-IV Waveforms (Aug. 9, 2022, 9:44 a.m.)
We are pleased to announce an initial release of a version (0.1.0) of the MIMIC-IV-Waveform module. These waveforms are a rich source of patient information - including ECG, PPG, and Blood Pressure signals - and can be linked to the clinical information in MIMIC-IV. This initial release contains 200 records from 198 patients. An upcoming release will include around 10,000 records.
The dataset was the subject of a workshop at IEEE EMBC in July of 2022, led by Peter Charlton, which demonstrated how to use the WFDB-Python package to extract and analyze waveform features. Executable notebooks and tutorial materials are available at: https://mimic.mit.edu/docs/iv/tutorials/waveform/ieee_workshop/ .
When using this resource, please cite:
(show more options)
Moody, B., Hao, S., Gow, B., Pollard, T., Zong, W., & Mark, R. (2022). MIMIC-IV Waveform Database (version 0.1.0). PhysioNet. https://doi.org/10.13026/a2mw-f949.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
The MIMIC-IV Waveform Database is a large collection of physiological signals and measurements from patients in intensive care units, including electrocardiograms, photoplethysmograms, respiration, invasive and non-invasive blood pressure, and more. These measurements and signals are obtained directly from the bedside monitor, and provide a detailed view into the physiology of critically ill patients.
Combining this database with the clinical information found in MIMIC-IV provides a broad, cross-sectional example of the data available to caregivers in a modern ICU. We hope that this database will provide a foundation for future improvements to monitoring technology as well as data-driven diagnosis and treatment.
Background
In an intensive care setting, patients are continuously monitored to obtain as much information as possible about their condition, and to alert caregivers to potentially life-threatening changes.
This monitoring data is analyzed in real-time, but is seldom captured, archived, or analyzed beyond the bedside monitor itself. However, these recordings are valuable both individually as a resource for clinical decision-making, and collectively as a way to study the relationships between physiological signals, diseases, treatments, and outcomes.
The MIMIC-IV Waveform Database provides a cross-sectional sample of high-resolution monitoring data collected from intensive care patients at a single hospital. This database is a companion to the main MIMIC-IV database, which provides broader context about these patients, their diagnoses, treatments, and other information collected in the electronic medical record [1].
Methods
The bedside monitors, used in the intensive care units where MIMIC-IV was collected, are linked to a local area network, allowing monitoring data to be viewed from the central nursing station. This central station can also be configured to transfer data continuously into a proprietary relational database, using an application provided by the vendor. To avoid consuming disk space without limit, data is discarded after a fixed number of days.
We configured this system to store data from all of the ICU monitors in the hospital and to retain data for several weeks. Then, every 24 hours, we retrieved a cross-sectional slice of the relational database, storing the results in “flat” text and binary files which could be transferred to a long-term archive server. Later, these files were parsed, collated into individual patient records, converted into WFDB and CSV formats, and de-identified [2].
De-identification
The records in this database were de-identified according to the same method used for the MIMIC-IV clinical database. The original source data included several forms of protected health information: the patient’s name, date of birth, and medical record number (in some cases); and the date of the recording. All of this information has been removed or replaced with non-identifying information. In place of the medical record number, a randomly assigned subject_id
and hadm_id
are used. In place of the recording date, a surrogate date is generated by adding a randomly-assigned offset (which matches the offset used in MIMIC-IV.)
Data Description
Two types of data were collected by this project:
- waveform data, which consists of a high-resolution, regularly sampled time series, collected directly from a measuring device;
- numeric data, which consists of values that are either digitally derived (by software running on the bedside monitor, typically by analyzing one or more waveform signals), or sampled irregularly (such as non-invasive blood pressure).
Each patient record, therefore, contains both a waveform record, which is stored as a collection of files in WFDB format, and a table of numerics, which is stored as a compressed CSV file [2].
Record splitting
Data in the source database is associated with a particular internal “PatientId”, which in turn is associated with a care unit, room, and bed. However, there is no fully-automated system for associating a PatientId with an actual patient’s medical record; the caregiver must manually “admit” and “discharge” each patient to and from the monitoring system. As a result, the monitor is sometimes left active after a patient leaves the ICU, meaning that two or more patients will have the same PatientId. Therefore, to avoid incorrectly combining two patients’ data into one record, records were automatically split apart whenever there was a gap of more than one hour with no waveform or numeric data present. This in turn means that a single patient’s ICU stay will sometimes be divided into multiple records, even if the patient was not actually discharged from the monitor.
Storage formats
To save disk space and download time, both the waveform and numerics files in this database have been compressed.
Signal files have been compressed using flac
(version 1.3.2-3, on Debian 10, with options -8 -r8 -e
.) In order to read these signal files using WFDB, you will need to have a recent version of WFDB that includes support for FLAC compression [3, 4].
Numerics files have been compressed using dictzip
(version 1.12.1+dfsg-8.) These files can be read using gzip
or any gzip-compatible library.
Usage Notes
Each patient has a unique number (subject_id
) which is used to identify the patient in MIMIC-IV. Additionally, a unique number (hadm_id
) is used to identify a particular hospital admission; in other words, the same patient may be admitted to the hospital on multiple occasions - possibly months or years apart - so the same subject_id
may have multiple hadm_id
values.
Furthermore, in the course of a single hospital admission, the same patient may have multiple ICU stays, and during a single ICU stay there may be more than one waveform record.
In this database, all records associated with a given patient are stored in a single folder named after the patient’s subject_id
. These folders are further collected into 100 top-level folders (named after the first three digits of the subject_id
). For example, records associated with subject 10039708 are stored in the p100/p10039708 folder. Each waveform record is then stored in a separate folder, such as p100/p10039708/83411188.
Each record folder then contains several files:
- A multi-segment header file, such as 83411188.hea, which provides general information about the record (starting date, time, length, subject and hospital admission IDs) along with the list of waveform segments.
- A layout header file, such as 83411188_0000.hea, which provides the list of waveform signals present in the record.
- One or more segment header files, such as 83411188_0001.hea, and one or more signal files, such as 83411188_0001e.dat, which contain the actual waveform data.
- A numerics file, such as 83411188n.csv.gz, which contains a table of numeric values.
Accessing waveform data
You can preview and explore waveform data online using LightWAVE (click “Visualize waveforms” below.)
To load data into a Python program, we recommend using the WFDB-Python package (version 4.0.0 or later.) For example, if you have downloaded the above record, the following Python commands will display the first 50 frames (about 800 milliseconds):
import wfdb
rec = wfdb.rdrecord('83411188',
sampfrom=0, sampto=50,
smooth_frames=False)
for (name, units, data) in zip(rec.sig_name,
rec.units,
rec.e_p_signal):
print('{} (units {}):'.format(name, units))
print(data)
To load data into a C or C++ program, we recommend using the WFDB Software Package (version 10.7.0 or later.) Note that you will need to install the libFLAC
library before installing the WFDB Software Package.
Accessing numeric data
Numeric data is stored as a compressed CSV file. The following Python commands will display the first ten SpO2 measurements:
import csv, gzip
with gzip.open('83411188n.csv.gz', 'rt') as gzf:
n = 0
for row in csv.DictReader(gzf):
t = int(row['time'])
spo2 = row['SpO2 [%]']
if spo2:
print(t, spo2)
n = n + 1
if n == 10:
break
Note that the time
value is measured in “counter ticks” elapsed from the start of the record. To convert each time
value to a wall-clock date and time, you can use the following:
import csv, datetime, gzip, wfdb
header = wfdb.rdheader('83411188')
with gzip.open('83411188n.csv.gz', 'rt') as gzf:
n = 0
for row in csv.DictReader(gzf):
t = int(row['time'])
seconds = t / header.counter_freq
delta = datetime.timedelta(0, seconds)
dt = header.base_datetime + delta
spo2 = row['SpO2 [%]']
if spo2:
print(dt, spo2)
n = n + 1
if n == 10:
break
Release Notes
Version 0.1.0 is the first public release of the MIMIC-IV Waveform Database. This release contains just 200 records from 198 patients, and is intended as a technical preview for the community in advance of version 1.0.0, which is intended to be released shortly.
Ethics
The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
Conflicts of Interest
The authors have no conflicts of interests to declare.
References
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2022). MIMIC-IV (version 2.0). PhysioNet. https://doi.org/10.13026/7vcr-e114.
- WFDB website. https://wfdb.io/ [Accessed: 6 July 2022]
- Moody, G., Pollard, T., & Moody, B. (2022). WFDB Software Package (version 10.7.0). PhysioNet. https://doi.org/10.13026/gjvw-1m31.
- Xie, C., McCullum, L., Johnson, A., Pollard, T., Gow, B., & Moody, B. (2021). Waveform Database Software Package (WFDB) for Python (version 3.4.1). PhysioNet. https://doi.org/10.13026/egpf-2788.
Parent Projects
Access
Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Open Database License v1.0
Discovery
DOI (version 0.1.0):
https://doi.org/10.13026/a2mw-f949
DOI (latest version):
https://doi.org/10.13026/6269-ws81
Corresponding Author
Files
Total uncompressed size: 12.8 GB.
Access the files
-
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/mimic4wdb/0.1.0/
-
Download the files using AWS command line tools:
aws s3 sync s3://physionet-open/mimic4wdb/0.1.0/ DESTINATION