Database Open Access

# VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients

Published: Sept. 21, 2022. Version: 1.0.0

Lee, H., & Jung, C. (2022). VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients (version 1.0.0). PhysioNet. https://doi.org/10.13026/czw8-9p62.

Lee, HC., Park, Y., Yoon, S.B. et al. VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients. Sci Data 9, 279 (2022)

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

## Abstract

In modern anesthesia, multiple medical devices are used simultaneously to comprehensively monitor real-time vital signs to optimize patient care and improve surgical outcomes. However, interpreting the dynamic changes of time-series biosignals and their correlations is a difficult task even for experienced anesthesiologists. Recent advanced machine learning technologies have shown promising results in biosignal analysis, however, research and development in this area is relatively slow due to the lack of biosignal datasets for machine learning. The VitalDB (Vital Signs DataBase) is an open dataset created specifically to facilitate machine learning studies related to monitoring vital signs in surgical patients. This dataset contains high-resolution multi-parameter data from 6,388 cases, including 486,451 waveform and numeric data tracks of 196 intraoperative monitoring parameters, 73 perioperative clinical parameters, and 34 time-series laboratory result parameters. All data is stored in the public cloud after anonymization. The dataset can be freely accessed and analysed using application programming interfaces and Python library. The VitalDB public dataset is expected to be a valuable resource for biosignal research and development.

## Background

Intraoperative vital signs such as electrocardiography, blood pressure, percutaneous oxygen saturation, and body temperature are objective measures of physiologic function and are tracked with high-acuity patient monitors during surgery and anesthesia. These vital signs are usually used as-is, but sometimes converted into clinically useful secondary parameters developed through mathematical, engineering, and medical algorithms. Modern anesthesia widely adopts advanced patient monitors that present a variety of secondary parameters such as electroencephalogram-based anesthesia depth index, arterial pressure-derived cardiac output, and electrocardiography and photoplethysmography-based analgesia index. Numerous studies have shown that these secondary parameters are useful for optimizing patient care during surgery and greatly improve postoperative outcomes.

Recent advances in machine learning technologies such as one-dimensional convolutional neural network allowed more accurate interpretation of the complex time-series biosignals. The relationship between various vital signs was also elucidated using artificial intelligence resulting in practical high-performance algorithms in the medical field. However, the lack of large-scale, high-resolution biosignal data required for machine learning has been a major obstacle to the development or improvement of biosignal algorithms. Electronic medical records (EMR) systems and automated anesthesia records (AAR) are important sources of biosignal big datasets, however, they have limited capabilities because (1) most EMR systems and AARs only store low time resolution data that are insufficient for interpretation of dynamic physiological changes during surgery; (2) essential waveform data such as electrocardiography, photoplethysmography, electroencephalography, and airway pressure waves are not stored on most systems due to cost or technical limitations, and (3) current recording systems do not fully support integrated recording of data from multiple devices. In general, obtaining high-quality vital signs data in surgical patients is considered technically difficult or very expensive.

Previously, we developed the Vital Recorder program, a data capture software that records time-synchronized high-resolution data from various anesthesia devices including patient monitors, anesthesia machines, brain monitors, cardiac monitors, target-controlled infusion pumps, and rapid infusion system. All parameters of multiple monitoring devices applied simultaneously to one patient are recorded as time-synchronized data tracks and stored as a single case file. Automatic recording function of this program has enabled massive collection of intraoperative biosignals in our tertiary, university hospital. The Vital signs DataBase (VitalDB) was constructed using (1) de-identified case files that were automatically recorded by the Vital Recorder program during daily surgery and anesthesia, and (2) perioperative patient information retrieved from our EMR system [1-3].

Unlike the previously reported public multi-parameter biosignal datasets, the VitalDB is the first public biosignal dataset specifically focused on perioperative patient care and is characterized by containing multi-parameter high-resolution waveform and numeric data. Since the VitalDB dataset was first released in 2017, it has been used for various big data research such as: deep learning for arterial pressure waveform-based cardiac output algorithm, deep learning-based pharmacokinetic-pharmacodynamic study of intravenous anesthetics, machine learning for bispectral index algorithm, statistical analysis of the relationship between intraoperative bispectral index and postoperative mortality, and deep learning algorithm to predict intraoperative hypotension from arterial waveforms.

Perioperative clinical information, laboratory results and surgical outcomes in this dataset may facilitate a variety of clinical outcomes or clinical decision support studies. Studies that elucidate the relationship between biosignal parameters and clinical variables will also be feasible. For instance, the effects of intraoperative variables such as hypotension, hypothermia, and low cardiac output on clinical outcomes such as acute kidney injury, the length of hospital stay, or in-hospital mortality can be examined. The physiologic effects of various interventions such as vasoactive drugs, fluids, anesthetics, and anesthesia machine settings may be sought from the dataset. This dataset may simply be used as data samples for developing signal processing algorithms. However, we argue that this big data is better suited for a training dataset for machine learning of biosignals or for external validation of biosignal algorithms created using other datasets.

A final point to mention is the limitation that our data are from a single institution and a single race (Asian). Researchers should be careful as this can lead to overfitting of algorithms. As multicenter data can be a solution to this problem, we have released the Vital Recorder program and the VitalDB dataset for free. We hope that multicenter biosignal research for the development of general algorithms will be widely implemented in the future.

## Methods

The database includes vital signs data and related clinical information that were prospectively recorded during surgery. The patient information was retrospectively obtained from our hospital’s EMR system after surgery.

### Approval for data collection

The acquisition and free disclosure of the data was approved by the Institutional Review Board of Seoul National University Hospital (H-1408-101-605). The study was also registered at clinicaltrials.gov (NCT02914444). Written informed consent was waived due to anonymity of the data. Data collection was performed in accordance with relevant guidelines and regulations of the institutional Ethics Committee.

### Study population

Data were obtained from non-cardiac (general, thoracic, urological, and gynecological) surgery patients who underwent routine or emergency operation at Seoul National University Hospital, Seoul, Korea from Aug 2016 to Jun 2017. Of the 7,051 eligible cases, cases with local anesthesia (239), incomplete recording (279), and loss of essential data tracks (145) were excluded. Finally, 6,388 cases (91%) who received general anesthesia, spinal anesthesia, and sedation/analgesia were included in the dataset.

### Dataset development

These methods are expanded versions of descriptions in our related work. All case files in this dataset were recorded using the Vital Recorder program (v 1.7.4). The laptop computer executing the Vital Recorder program was connected to multiple patient monitoring devices via serial cables. Monitoring data from multiple anesthesia devices applied to one patient were recorded in one case file in a time-synchronized manner.

The same recording systems were installed in 10 out of 31 operating rooms to collect data over a year. The recording system operated for 24 hours every day, and case files of individual patients were automatically recorded separately. The case-by-case automatic recording was performed with the following method:

• When both heart rate and percutaneous oxygen saturation signals are detected, patient monitoring is considered to have started and case recording begins immediately.
• If the input of heart rate and percutaneous oxygen saturation signals disappears for more than 10 minutes according to the end of patient monitoring, the recording is automatically stopped. The data collection process was remotely monitored every day in real-time through web monitoring, and the integrity of the case-matched vital files was reviewed on a weekly basis. After verification of case-matched vital files (detailed in the Technical Validation section), track processing was performed using code for verified vital files
• Tracks with all 0 values or less than 10 data samples were deleted.
• Waveform tracks without corresponding numeric tracks were deleted.
• Track name changes have been made for improving the usability of the dataset.

Finally, de-identification of the dataset was performed before the release of the dataset.

• If a femoral arterial catheter was confirmed on the anesthesia records, the related arterial waveform and numeric tracks were renamed to from ART, ART_SBP, ART_DBP, and ART_MBP to FEM, FEM_SBP, FEM_DBP, and FEM_MBP, respectively.
• “PUMP” in the PUMP_RATE and PUMP_VOL tracks has been changed to specific drug names, obtained from infusion pump data or anesthesia records (eg. EPI_RATE, PPF20_VOL).

The demographic, surgical, anesthetic, preoperative, intraoperative and outcomes data of the patients were obtained from the EMR system and included in the dataset. The laboratory test results within 90 days before and after the anesthesia start time were extracted from the EMR, and all non-numeric characters were removed from the results. This information is organized in separate csv files in the dataset.

• Instead of the actual patient number, random surgery case identifiers (caseid) were assigned to the cases (1–6,388); Individual identifiers of the hospital ID (subjectid) was also added for reoperation case identification (1–6,090).
• Since case-matched-and-renamed vital files no longer contain any patient identification information, only de-identification of the recording time was performed.
• The surgery start and end times, and the anesthesia start and end times were extracted from the EMR and integrated to the event track of vital files.
• The starting time point of the recording was set to “0” and the other time were converted to the relative time to the start point.

## Data Description

The dataset consists of intraoperative vital signs data (6,388 vital files in vital format), perioperative clinical information (clinical information.csv) and the laboratory results (lab results.csv) of 6,388 surgery cases. All data is accessible from repository.

In brief, the dataset has the following characteristics:

• The dataset consists of intraoperative vital signs data and perioperative clinical information of 6,388 cases.
• Vital signs data includes up to 12 waveform and 184 numeric data tracks acquired from multiple anesthesia devices applied to patients during surgery. The total number of data tracks is 486,451 (average 87, range 16–129).
• Vital signs data have various time intervals according to the anesthesia devices, with a time resolution of 1–7 seconds for numeric data and 62.5–500 Hz for waveform data (Table 2). Each case file contains an average of 2.8 million data points.
• Data is not pre-processed because the real-world noise in the vital signs data is very essential to the development of practical monitoring algorithms.
• A total of 74 perioperative clinical information parameters and 34 time-series perioperative laboratory results are provided to help interpret the relationship with the intraoperative vital signs.

Since different anesthesia equipment was used for each patient, the data tracks are configured differently for each case file. Specifically, data from the patient monitor (SolarTM 8000 M, GE healthcare, Wauwatosa, WI, USA) was taken from all patients, and analog signal (TramRac-4A, GE healthcare, Wauwatosa, WI, USA) data was acquired from most patients. Data from the anesthesia machine (Primus, Dräger, Lübeck, Germany) were recorded in most patients except for regional anesthesia cases. Data from the brain monitor (BIS VistaTM, Medtronic, Dublin, Ireland) were recorded in most patients undergoing general anesthesia and sedation/analgesia. Data from the target-controlled infusion pumps (Orchestra® Base Primea with module DPS, Fresenius Kabi AG, Bad Homburg, Germany) were recorded in all patients undergoing intravenous anesthesia and balanced anesthesia. The infusion pump data also includes infusion histories of various intravenous drugs. Cardiac monitors (Vigileo/FloTrac, EV1000 and Vigilance II monitors, Edwards Lifesciences, Irvine, CA, USA; CardioQ-ODM+, Deltex Medical, Chichester, UK), a rapid infusion device (FMS2000, Belmont instrument corporation, Billerica, MA, USA), and a cerebral/somatic oximeter (INVOSTM, Medtronic, Dublin, Ireland) were used at the anesthesiologist’s discretion. In conclusion, among the 196 parameters, 16–129 parameters were recorded for each case.

The clinical information file provides patient-related perioperative data to help interpret biosignal data. This file consists of caseid and subjected, and 72 clinical parameters including case file information, demographic data, outcomes, preoperative laboratory data, and surgery and anesthesia related data. Among the parameters, “casestart” is the time the patient’s case file recording started, and the value is always “0”. All time-series data in the VitalDB dataset is anonymized in seconds using the casestart time as a reference point. Since the anesthesia start time (anestart) and anesthesia end time (aneend) are the times recorded at 5-minute intervals in the EMR, there may be a difference of several minutes from the start time (casestart) and end time (caseend) of the actual case recording.

Finally, the laboratory results file contains 928,448 time-series data for 34 blood tests from 3 months before surgery to 3 months after surgery. Laboratory results are provided as a list of case identifier (caseid), blood test time (dt), test name (name), and value (result) for each test. Since the test time is a relative time expressed in seconds with the cases start time as a reference point, preoperative tests have negative time values.

## Usage Notes

The vital file is a binary file recorded with the Vital Recorder program that contains time-series records of vital signs. The specification of the vital file format is detailed a document in the open data repository.

A python package “vitaldb” that helps reading and writing of vital files is freely available on PyPI, the Python Package Index [5].

There is a function named load_case to load track data from a single case file. The load_case function is described below:

• Description: Load multiple track data from single case.
• Usage:load_case (caseid, tnames, interval = 1)
• Arguments
• caseid: caseid to load.
• tnames: list or comma separated string of ‘device name/track name’.
• interval: time interval between data points. Default value is 1 second.
• Usage example
• load_case([‘SNUADC/ART’, ‘Solar8000/ART_SBP’], interval = 1/100)
• load_case(‘SNUADC/ART,Solar8000/ART_SBP’, interval = 1/100)

There is a class called VitalFile in the vitaldb library that can also help with reading and writing vital files:

• Description: A class for reading and writing a vital file format.
• Usage: VitalFile (filepath, tnames)
• Arguments
• filepath: file path to read.
• tnames: list or comma separated string of ‘device name/track name’ to read
• Usage example
• vf = VitalFile(‘00001.vital’).to_vital(‘00001_copy.vital’)
• vf = VitalFile(‘00001.vital’).to_numpy([‘SNUADC/ART’, ‘Solar8000/ART_SBP’], interval = 1/100)
• vf = VitalFile(‘00001.vital’).to_pandas ([‘SNUADC/ART’, ‘Solar8000/ART_SBP’], interval = 1/100)

After reading the vital file with the VitalFile object, researchers can use the to_vital method to save the data as vital file format again, or use to_numpy or to_pandas methods to get the samples of specific tracks as a numpy array or a pandas DataFrame.

## Release Notes

v1.0.0: Initial release on PhysioNet

Prior releases: the VitalDB database was originally released in 2018 at https://vitaldb.net

## Ethics

The acquisition and free disclosure of the data was approved by the Institutional Review Board of Seoul National University Hospital (H-1408-101-605). The study was also registered at clinicaltrials.gov (NCT02914444). Written informed consent was waived due to anonymity of the data. Data collection was performed in accordance with relevant guidelines and regulations of the institutional Ethics Committee.

## Acknowledgements

This study was supported by the Ministry of Science and ICT (MSIT), Republic of Korea, under the Information Technology Research Center (ITRC) support program (IITP-2022-2018-0-01833) supervised by the Institute for Information & communications Technology Promotion (IITP). This study was also supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (HI21C1074).

## Conflicts of Interest

The authors declare no competing interests.

## References

1. Lee, HC., Park, Y., Yoon, S.B. et al. VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients. Sci Data 9, 279 (2022). https://doi.org/10.1038/s41597-022-01411-5
2. Vistisen, S. T., Pollard, T. J., Enevoldsen, J. & Scheeren, T. W. L. VitalDB: fostering collaboration in anaesthesia research. Br J Anaesth 127, 184–187 (2021). https://doi.org/10.1016/j.bja.2021.03.011
3. Lee, H. C. & Jung, C. W. Vital Recorder-a free research tool for automatic recording of high-resolution time-synchronised physiological data from multiple anaesthesia devices. Sci Rep 8, 1527 (2018). https://doi.org/10.1038/s41598-018-20062-4
4. VitalDB website: https://vitaldb.net [Accessed 10 September 2022]
5. Python software for reading VitalDB on PyPI: https://pypi.org/project/vitaldb/ [Accessed 10 September 2022]

##### Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

##### Discovery

Project Website:
https://vitaldb.net

##### Corresponding Author
You must be logged in to view the contact information.

## Files

Total uncompressed size: 95.4 GB.

##### Access the files
wget -r -N -c -np https://physionet.org/files/vitaldb/1.0.0/