ICU patients are typically the most physiologically fragile patients in the hospital and may experience prolonged hospital stays with significant morbidity and mortality. The modern ICU employs an impressive array of technologies that results in the generation of a rich-yet disparate-set of clinical and physiologic data used to guide patient care. ICU clinicians are challenged to interpret all the available ICU data to not only improve patient outcomes, but also to contain costs and adopt evidence-based practices. The enormous amount of ICU data and its poor organization make its integration and interpretation time-consuming and inefficient. The data overload that results may actually hinder the diagnostic process, and may even lead to neglect of relevant data, resulting in errors and complications in ICU care (1). In the long term, automated or semi-automated monitoring and clinical decision support systems (CDSS) are needed. These systems must be capable of not only presenting ICU data to human users but also of forming pathophysiological hypotheses that best explain the rich and complex volume of relevant data from clinical observations, bedside monitors, mechanical ventilators and the wide variety of available laboratory tests and imaging studies. Such systems should reduce the ever-growing problem of information overload, and provide much more clinically relevant and timely alarms than today's disparate limit-based alarms. While there have been decades of research in utilizing artificial intelligence and expert-systems for medical data processing and forecasting (2), little research has found its way into widely deployed ICU monitoring and information systems. The development of such systems requires access to large volumes of real-world ICU data that can serve as a testing platform to refine and evaluate such algorithms.
The role of a rich and comprehensive database in this context is twofold. First, through data mining, such a database allows for extensive epidemiological studies that link patient data to clinical practice and outcomes. Such insight can in turn motivate the development of alarms, alerts, or algorithms to improve clinical practice and thus improve patient outcomes. Second, it is essential to develop and test algorithms with real data, and to be able to perform such tests repeatedly and reproducibly as algorithm refinements evolve. Within the critical care community, well-known databases including APACHE (3) and Project IMPACT (4) have resulted in the acquisition of hundreds of thousands of ICU patient cases from dozens of hospitals throughout the United States of America. The purpose of such databases is mostly to assess and compare the severity of ICU patient conditions and outcomes, and the costs of treatment across all participating intensive care units on the basis of very few, highly aggregate pieces of information. Such data abstractions often do not include detailed information regarding temporal relationships between therapeutic interventions and corresponding diagnostic data, and thus, would be insufficient to characterize clinically significant transient events such as hemodynamic instability, or acute organ injury.
The detail and volume of data necessary to support such research as described above has been difficult to gather in the past due to limitations on computational processing power, networking bandwidth, digital storage capacities, proprietary vendor data formats, and concerns related to patient privacy (5). Through a collaborative effort between academia, industry, and clinical medicine, we have attempted to address these aforementioned challenges, and established a major new, publicly available ICU database, MIMIC II.
For more information on the rationale for assembling this database,
the original research proposal can be found here:
This document provides a detailed overview of the formation and contents of the MIMIC II database. The methodology used in data post-processing and the organization of MIMIC II is also described. In section 2 we provide a characterization of MIMIC II with respect to quantitative data specifications as well as clinical characterizations using standard metrics such as patient acuities, problem lists, demographics, and mortality rates. In section 3 access modalities are described.