Since a patient may have been admitted several times during the period in which our data were collected, it is important to understand exactly how to identify patients and his/her stay(s).
There are essentially four identifiers for data associated with any given patient:
Figure 1.4 illustrates the possible data available for a given individual, identified by a ``subject_id''. Time progresses from left to right, and the different types of data collected are shown vertically. Each subject can have multiple hospital admissions, identified with ``hadm_ids''. Each hosptial admission can contain multiple ICU stays, identified with ``icustay_ids''. Waveforms collected during ICU stays are identified using ``case_ids''. Laboratory and microbiology tests are performed throughout a hospital stay and can therefore take place outside the ICU stay. Vital sign validation, medications, fluid balances and nursing notes are only performed in the ICU and are not available during the remainder of the hospital stay. Date of death is recorded in-hospital and has also been obtained from social security records for out-of-hospital mortality.
The above illustrates an ``ideal'' case where the timestamps associated with the data fall within the hospital and/or ICU stay. Unfortunately, real-world issues can complicate matters allowing data to be recorded outside of a patient stay. For example, a patient could be physically present in the ICU and connected to monitors before their admission has been entered into the system. This results in a waveform recording which starts before the subject's ICU admission. Furthermore, missing/mistaken data can mean that ICU stays exist where there is no matching hospital admission record.
Note that a patient may move between ICUs during any given admission. If the move is longer than 24 hours, we define it to be a new ICU stay. Note also that the amount of data varies during and between ICU stays and that data are often missing - see section 1.6.