Database Open Access

MIMIC-IV Waveform Database

Benjamin Moody Sicheng Hao Brian Gow Tom Pollard Wei Zong Roger Mark

Published: July 10, 2022. Version: 0.1.0


When using this resource, please cite: (show more options)
Moody, B., Hao, S., Gow, B., Pollard, T., Zong, W., & Mark, R. (2022). MIMIC-IV Waveform Database (version 0.1.0). PhysioNet. https://doi.org/10.13026/a2mw-f949.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

The MIMIC-IV Waveform Database is a large collection of physiological signals and measurements from patients in intensive care units, including electrocardiograms, photoplethysmograms, respiration, invasive and non-invasive blood pressure, and more. These measurements and signals are obtained directly from the bedside monitor, and provide a detailed view into the physiology of critically ill patients.

Combining this database with the clinical information found in MIMIC-IV provides a broad, cross-sectional example of the data available to caregivers in a modern ICU. We hope that this database will provide a foundation for future improvements to monitoring technology as well as data-driven diagnosis and treatment.


Background

In an intensive care setting, patients are continuously monitored to obtain as much information as possible about their condition, and to alert caregivers to potentially life-threatening changes.

This monitoring data is analyzed in real-time, but is seldom captured, archived, or analyzed beyond the bedside monitor itself. However, these recordings are valuable both individually as a resource for clinical decision-making, and collectively as a way to study the relationships between physiological signals, diseases, treatments, and outcomes.

The MIMIC-IV Waveform Database provides a cross-sectional sample of high-resolution monitoring data collected from intensive care patients at a single hospital. This database is a companion to the main MIMIC-IV database, which provides broader context about these patients, their diagnoses, treatments, and other information collected in the electronic medical record [1].


Methods

The bedside monitors, used in the intensive care units where MIMIC-IV was collected, are linked to a local area network, allowing monitoring data to be viewed from the central nursing station. This central station can also be configured to transfer data continuously into a proprietary relational database, using an application provided by the vendor. To avoid consuming disk space without limit, data is discarded after a fixed number of days.

We configured this system to store data from all of the ICU monitors in the hospital and to retain data for several weeks. Then, every 24 hours, we retrieved a cross-sectional slice of the relational database, storing the results in “flat” text and binary files which could be transferred to a long-term archive server. Later, these files were parsed, collated into individual patient records, converted into WFDB and CSV formats, and de-identified [2].

De-identification

The records in this database were de-identified according to the same method used for the MIMIC-IV clinical database. The original source data included several forms of protected health information: the patient’s name, date of birth, and medical record number (in some cases); and the date of the recording. All of this information has been removed or replaced with non-identifying information. In place of the medical record number, a randomly assigned subject_id and hadm_id are used. In place of the recording date, a surrogate date is generated by adding a randomly-assigned offset (which matches the offset used in MIMIC-IV.)


Data Description

Two types of data were collected by this project:

  • waveform data, which consists of a high-resolution, regularly sampled time series, collected directly from a measuring device;
  • numeric data, which consists of values that are either digitally derived (by software running on the bedside monitor, typically by analyzing one or more waveform signals), or sampled irregularly (such as non-invasive blood pressure).

Each patient record, therefore, contains both a waveform record, which is stored as a collection of files in WFDB format, and a table of numerics, which is stored as a compressed CSV file [2].

Record splitting

Data in the source database is associated with a particular internal “PatientId”, which in turn is associated with a care unit, room, and bed. However, there is no fully-automated system for associating a PatientId with an actual patient’s medical record; the caregiver must manually “admit” and “discharge” each patient to and from the monitoring system. As a result, the monitor is sometimes left active after a patient leaves the ICU, meaning that two or more patients will have the same PatientId. Therefore, to avoid incorrectly combining two patients’ data into one record, records were automatically split apart whenever there was a gap of more than one hour with no waveform or numeric data present. This in turn means that a single patient’s ICU stay will sometimes be divided into multiple records, even if the patient was not actually discharged from the monitor.

Storage formats

To save disk space and download time, both the waveform and numerics files in this database have been compressed.

Signal files have been compressed using flac (version 1.3.2-3, on Debian 10, with options -8 -r8 -e.) In order to read these signal files using WFDB, you will need to have a recent version of WFDB that includes support for FLAC compression [3, 4].

Numerics files have been compressed using dictzip (version 1.12.1+dfsg-8.) These files can be read using gzip or any gzip-compatible library.


Usage Notes

Each patient has a unique number (subject_id) which is used to identify the patient in MIMIC-IV. Additionally, a unique number (hadm_id) is used to identify a particular hospital admission; in other words, the same patient may be admitted to the hospital on multiple occasions - possibly months or years apart - so the same subject_id may have multiple hadm_id values.

Furthermore, in the course of a single hospital admission, the same patient may have multiple ICU stays, and during a single ICU stay there may be more than one waveform record.

In this database, all records associated with a given patient are stored in a single folder named after the patient’s subject_id. These folders are further collected into 100 top-level folders (named after the first three digits of the subject_id). For example, records associated with subject 10039708 are stored in the p100/p10039708 folder. Each waveform record is then stored in a separate folder, such as p100/p10039708/83411188.

Each record folder then contains several files:

  • A multi-segment header file, such as 83411188.hea, which provides general information about the record (starting date, time, length, subject and hospital admission IDs) along with the list of waveform segments.
  • A layout header file, such as 83411188_0000.hea, which provides the list of waveform signals present in the record.
  • One or more segment header files, such as 83411188_0001.hea, and one or more signal files, such as 83411188_0001e.dat, which contain the actual waveform data.
  • A numerics file, such as 83411188n.csv.gz, which contains a table of numeric values.

Accessing waveform data

You can preview and explore waveform data online using LightWAVE (click “Visualize waveforms” below.)

To load data into a Python program, we recommend using the WFDB-Python package (version 4.0.0 or later.) For example, if you have downloaded the above record, the following Python commands will display the first 50 frames (about 800 milliseconds):

import wfdb

rec = wfdb.rdrecord('83411188',
                    sampfrom=0, sampto=50,
                    smooth_frames=False)

for (name, units, data) in zip(rec.sig_name,
                               rec.units,
                               rec.e_p_signal):
    print('{} (units {}):'.format(name, units))
    print(data)

To load data into a C or C++ program, we recommend using the WFDB Software Package (version 10.7.0 or later.) Note that you will need to install the libFLAC library before installing the WFDB Software Package.

Accessing numeric data

Numeric data is stored as a compressed CSV file. The following Python commands will display the first ten SpO2 measurements:

import csv, gzip

with gzip.open('83411188n.csv.gz', 'rt') as gzf:
    n = 0
    for row in csv.DictReader(gzf):
        t = int(row['time'])
        spo2 = row['SpO2 [%]']
        if spo2:
            print(t, spo2)
            n = n + 1
            if n == 10:
                break

Note that the time value is measured in “counter ticks” elapsed from the start of the record. To convert each time value to a wall-clock date and time, you can use the following:

import csv, datetime, gzip, wfdb

header = wfdb.rdheader('83411188')

with gzip.open('83411188n.csv.gz', 'rt') as gzf:
    n = 0
    for row in csv.DictReader(gzf):
        t = int(row['time'])
        seconds = t / header.counter_freq
        delta = datetime.timedelta(0, seconds)
        dt = header.base_datetime + delta
        spo2 = row['SpO2 [%]']
        if spo2:
            print(dt, spo2)
            n = n + 1
            if n == 10:
                break

Release Notes

Version 0.1.0 is the first public release of the MIMIC-IV Waveform Database. This release contains just 200 records from 198 patients, and is intended as a technical preview for the community in advance of version 1.0.0, which is intended to be released shortly.


Ethics

The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.


Conflicts of Interest

The authors have no conflicts of interests to declare.


References

  1. Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2022). MIMIC-IV (version 2.0). PhysioNet. https://doi.org/10.13026/7vcr-e114.
  2. WFDB website. https://wfdb.io/ [Accessed: 6 July 2022]
  3. Moody, G., Pollard, T., & Moody, B. (2022). WFDB Software Package (version 10.7.0). PhysioNet. https://doi.org/10.13026/gjvw-1m31.
  4. Xie, C., McCullum, L., Johnson, A., Pollard, T., Gow, B., & Moody, B. (2021). Waveform Database Software Package (WFDB) for Python (version 3.4.1). PhysioNet. https://doi.org/10.13026/egpf-2788.

Parent Projects
MIMIC-IV Waveform Database was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Open Data Commons Open Database License v1.0

Discovery

DOI (version 0.1.0):
https://doi.org/10.13026/a2mw-f949

DOI (latest version):
https://doi.org/10.13026/6269-ws81

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 12.8 GB.

Access the files

Visualize waveforms

Folder Navigation: <base>
Name Size Modified
waves
LICENSE.txt (download) 25.2 KB 2022-07-06
RECORDS (download) 4.3 KB 2022-07-06
SHA256SUMS.txt (download) 1.6 MB 2022-07-10