Database Credentialed Access

Curated Data for Describing Blood Glucose Management in the Intensive Care Unit

Aldo Robles Arévalo Roselyn Mateo-Collado Leo Anthony Celi

Published: April 19, 2021. Version: 1.0.1

When using this resource, please cite: (show more options)
Robles Arévalo, A., Mateo-Collado, R., & Celi, L. A. (2021). Curated Data for Describing Blood Glucose Management in the Intensive Care Unit (version 1.0.1). PhysioNet.

Additionally, please cite the original publication:

Robles Arévalo, A., Maley, J.H., Baker, L. et al. Data-driven curation process for describing the blood glucose management in the intensive care unit. Sci Data 8, 80 (2021).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


Analysis of real-world glucose and insulin clinical data recorded in electronic health records can provide insights into tailored approaches to clinical care, but still present many analytic challenges. The present data subsets are the result of a detailed curation process that extracts and pairs glucose readings to insulin therapy on a per-patient basis during an admission to the Intensive Care Unit (ICU) in the Medical Information Mart for Intensive Care (MIMIC-III) database. Curated data include over 500,000 glucose readings and more than 140,000 insulin entries for nearly 9,600 patients distributed in more than 11,000 ICU stays. Also, the proposed curation process involved the creation of glucose - insulin pairing rules according to clinical expert-defined physiologic and pharmacologic parameters. With the proposed rules, it was possible to pair nearly 76% of insulin administration events to a preceding blood glucose reading. The two shared data subsets have the potential to reveal insights regarding real-world practice for the glycemic control in the ICU. Moreover, the shared material serves as a framework for future studies of glucose management and insulin therapy replacement in the ICU, which may allow researchers to tailor queries and data processing to their own study objectives.


The glucose management in the Intensive Care Unit (ICU) has been widely debated due to current conflicting evidence and opinions around glucose targets and management strategies in several intensive care scenarios [1]. There are few randomized controlled clinical trials which study glycemic control in this clinical setting. In part, due to the complexity of studying large heterogenous populations and standardizing protocols for glucose management across several hospitals.

Current clinical guidelines recommend avoiding both hypo- and hyper-glycemia. For that, care providers should monitor blood glucose levels and, according to these glycemic checks, they may administer posteriorly exogenous insulin (or insulin analogs) as a medication to control hyperglycemia. There are many forms of insulin and they are classified according to duration of activity or pharmacological effect in a body, short or regular (4-8 hours), intermediate (10-12 hours) or long (12-24 hours). Insulin is also classified based on route of administration, which includes intravenous continuous infusion (insulin drip), intravenous bolus or subcutaneous bolus. The indication of any form of insulin and how to be administered will depend on patient's characteristics and severity of illness.

Retrospective analysis of real-world data can potentially reveal valuable insights into specific ranges of glycemic targets [1], which may provide a survival advantage for certain populations of critically ill patients. The currently available large real-world data sets are not suitable in their raw form to answer clinical questions. Without careful curation, posterior analysis will provide misleading models.

Here we share two data subsets that were derived from the the Medical Information Mart for Intensive Care (MIMIC-III) database v1.4 [2]. The data subsets shared here gathers all the curated entries of blood glucose readings and insulin treatments for nearly 9,600 patients of any age admitted to the ICU between 2008 and 2012. On average, these patients have a length of stay (LOS) of 12.0 ± 13.0 days and a median of 7 days. However, a LOS of 1 day accounts for 10.7% of the included ICU stays. The complementary material demonstrates how it was converted database queries of unprocessed glycemic checks and insulin entries into clinically validated and reproducible data subsets.

Through sharing these data subsets, it is provided a publicly available reference framework for matching glucose measurements and insulin events from real-world hospital data. Thus, these datasets can also be used as a starting point for retrospective analysis of outcomes in the intensive care unit, inform future clinical trial designs, and generate new treatment approaches [3-5].


The present data subsets are de-identified and includes all the curated entries of both glucose measurements and insulin administration during a stay in the ICU. The subsets presented were derived from the MIMIC-III v1.4 database [2]. The data in MIMIC-III has been de-identified, and the institutional review boards of the Massachusetts Institute of Technology (No. 0403000206) and Beth Israel Deaconess Medical Center (2001-P-001699/14) both approved the use of the database for research.

Inclusion criteria

The derived subsets contains all ICU admissions where patients received at least one insulin administration event in the entire LOS. No exclusion criteria was set for age, ethnicity, types of admission, and diagnoses. Also, first and subsequent admissions were included. The shared subsets collect the data from nearly 9,600 patients distributed in 11,724 ICU admissions between late 2008 and 2012.

These admissions were recorded in the MetaVision information system provided by iMDSoft. This system or period of time was selected because the data is richer and has better granularity for insulin events compared to the CareVue information system. The latter was used in the Beth Israel Deaconess Medical Center (BIDMC) between 2001 and early 2008.

Glucose readings

In MIMIC-III, patient blood glucose values are recorded using either laboratory chemistry analyzers or bedside fingerstick glucometers [6]. In MIMIC-III, laboratory analyzer glucose values are recorded in the LABEVENTS and CHARTEVENTS tables. Fingerstick glucometer measurements are only recorded in CHARTEVENTS. The items associated to glucose readings in MIMIC-III and included in our queries are depicted in Table 1.

Table 1 - Item ID’s related to glucose values identified in MIMIC-III.
Table Item ID Analytical Method
CHARTEVENTS 807 Fingerstick
CHARTEVENTS 811 Fingerstick
CHARTEVENTS 1529 Fingerstick
CHARTEVENTS 3744 Fingerstick
CHARTEVENTS 3745 Laboratory Analyzer
CHARTEVENTS 225664 Fingerstick
CHARTEVENTS 220621 Laboratory Analyzer
CHARTEVENTS 226537 Fingerstick
LABEVENTS 50809 Laboratory Analyzer
LABEVENTS 50931 Laboratory Analyzer

Despite glucose measured from the laboratory analyzer is more accurate, both methods of glucose measurement were included in the extraction scripts. Both were included because in the clinical setting fingerstick glucometer measurements are used more frequently and insulin can be dosed based only on fingerstick readings.

After merging these Item IDs, the following pre-processing steps were performed:

  • Removal of glucose values which were zero, blanks or null values.
  • Entries that in the ERROR column from the CHARTEVENTS table were not recorded or were recorded as 1 were removed.
  • Patients who did not receive any insulin were excluded.
  • Duplicate values were removed when identical readings with the same time stamp and glucose values appeared in both LABEVENTS and CHARTEVENTS.
  • All glucose values greater than or equal to 1,000 mg/dL were removed. These values are above the limit of accurate measurement for the laboratory analyzer at BIDMC. Only 144 entries were removed (0.006% of all raw glucose entries available (n=2,508,737)).
  • If a sample was measured in the laboratory analyzer and values is lower than 1,000 mg/dL, then the associated reading was included.
  • For fingerstick readings, the limit of accurate measurement at BIDMC for this method is 500 mg/dL. Thus, all values above this threshold were removed. Only 146 entries were removed (0.006% of all raw glucose entries available (n=2,508,737)).

The time stamps recorded for a glycemic check are based on charting by nursing staff. Due to other possible priorities in the ICU, there may be errors in the time stamp as entered by the nurses. The following assumptions were made to account for this:

  • Sometimes the STORETIME (time listed by nurses for checking glucose) was recorded before the CHARTTIME (time when the actual data entry occurred). In that case, the STORETIME timestamp was considered to be the time when the glycemic check occurred. Otherwise, the CHARTTIME timestamp was maintained as the time of glycemic check. Presumed delays in recording of the glucose readings occurred in 44,926 cases; which represents 1.8% of all raw glucose readings avaialable in MIMIC-III (n=2,508,737). The median of this delay was 25 min.

Once all of these criteria were considered, the values were merged with the curated subset containing the insulin inputs.

Insulin inputs

Infusions and boluses in MIMIC-III, including insulin type, are recorded in the INPUTEVENTS_CV (CareVue by Philips, admissions between 2001 and early 2008) and INPUTEVENTS_MV (MetaVision by iMDSoft, admissions between late 2008 and 2012) tables. However, insulin inputs were extracted only from the INPUTEVENTS_MV table because it provided richer features. This granularity is not found in the INPUTSEVENTS_CV table. In MetaVision, it was possible to identify the forms of insulin according to the duration of activity, short, intermediate, or long. Also, insulin was classified based on the route of administration including intravenous continuous infusion, intravenous bolus or subcutaneous bolus.

Insulin administration events were recorded in 6 different item IDs in MIMIC-III, each corresponding to a different type of insulin. These item IDs are depicted in Table 2. The raw number of instances of insulin administration was 151,201 within a population of 9,638 patients.

Table 2 – Item ID’s related to insulin entries.
Item ID Acting type
223258 Short
223262 Short
223260 Long
223259 Intermediate
223257 Intermediate
223261 Intermediate

Infusions and boluses of insulin are recorded differently in MIMIC-III. Infusions are recorded when it started, when the rate changed, and when the infusion is discontinued. These events are recorded as separate entries, but in reality these entries form a unique infusion. Thus, the duration of a certain rate of infusion should be used to calculate the total amount of insulin administered within a period of time. Boluses are recorded as independent events and these are instantaneous. In the data recorded in MetaVision, all boluses of insulin can be either short, intermediate, or long acting type; while all infusions are only regular insulin.

For further curation, some considerations were applied to the insulin inputs to remove outliers and for capturing as much data as possible:

  • When the current infusion rate is not recorded, this is interpreted to mean that the rate has not changed since the last data input, which is recorded in the ORIGINALRATE column.
  • For regular insulin boluses, values < 18.0 U represent 99 % of all values. For infusions, rates < 29.8 U/hour represent the 99 % of all entries. Values above the 99th percentile were determined by clinical experts to be erroneous and excluded. In all cases, values less than or equal to 0 were excluded.

This resulted in 145,694 instances of insulin administration within a population of 9,518 patients. Curated insulin data have a majority of insulin administrations given as short acting insulin (90.8%) and of the short acting type, 69.4% were administered in boluses (63.0% as subcutaneous, 6.4% as intravenous) and the remaining 30.6% were infusions. Once all these criteria were considered, the insulin data were incorporated with the curated subset containing the glucose readings.

Pairing rules

After the curated entries of insulin and glucose readings were merged, each insulin input, whether administered as a bolus or a change in the rate of an infusion, was aligned to a glucose reading. The clinical rationale is that each insulin input should be given if there is a previous glycemic check that justifies an administration of this medication to decrease glycemia.

The goal was to link each insulin entry with the nearest glucose reading.  For that, rules were derived from consensus between a group of clinicians based on physiologic and pharmacologic standards.The following rules and assumptions were implemented:

  • Rule 1: A glucose measurement should precede an insulin administration by up to 90 minutes. This basis for this time window was derived from the diabetic ketoacidosis guidelines which recommend measuring glucose values every 60 minutes while receiving an insulin infusion [7]. Thirty minutes were added (90 minutes in total) to this interval to account for the delay that providers may take to register the events. The proposed intervals are within the recommendations times [8]. This represented 79% of alignments or pairs of insulin and blood glucose.
  • Rule 2: When an insulin event was not preceded, but instead followed, by a blood glucose measurement up to 90 minutes behind, this glucose reading was paired with the insulin entry if they were recorded within 90 minutes of each other. In these cases, nursing staff may have delayed to register the events. This represented 8% of alignments.
  • Rule 3: In some cases an infusion/bolus appears between 2 blood glucose measurements. In this case, the higher glucose value is paired with the regular insulin entry as long as they were registered within 90 minutes of each other. This represented 5% of alignments.
  • Rule 4: When an insulin bolus occurs very close to an infusion, it was assumed that the patient was given a bolus and then commenced on an infusion. Both insulin entries were paired with the preceding glucose value, or the posterior glucose reading in case its value is higher than the preceding glucose and is entered within 90 minutes of the insulin dose. This represented 8% of alignments.
  • Rule 5: When a blood glucose values is less than 90 mg/dL, these readings were not paired with a subsequent insulin entry. Clinicians will not treat this low of a glycemia value with any insulin dose.

The paired dataset uploaded in this repository linked 75.5% of insulin entries with a corresponding glucose reading. Data scientists sorted and identified valid entries and patients as defined by clinical experts.

Data Description

The data subsets consist of two time series files that include all the curated entries of glucose readings and insulin inputs. The data comprises 603,764 entries for 9,517 patients distributed in 11,724 ICU admissions. The data files are glucose_insulin_ICU.csv and glucose_insulin_pair.csv. Each data file is a .csv file that were compressed as ZIP files. The file glucose_insulin_ICU.csv gathers the non-paired entries (16 columns); while the file glucose_insulin_pair.csv (21 columns) gathers the paired entries having an observation window of 90 minutes between a glucose reading and a insulin input. In this file, the paired insulin - glucose readings are listed on the same row. However, also the non-paired glucose readings or insulin inputs are listed on separate rows.

Description of fields

Common to glucose_insulin_ICU.csv and glucose_insulin_pair.csv

  1. SUBJECT_ID: It is the unique identifier for an individual patient.
  2. HADM_ID: Represents a single patient’s admission to the hospital.
  3. ICUSTAY_ID: Unique identifier for a single patient’s admission to the ICU.
  4. LOS_ICU_days: Length of stay in days.
  5. first_ICU_stay: True if it is the first admission to the ICU for a hospital admission.
  6. TIMER: Gathers the timestamps for either the STARTTIME for a single insulin input or the GLCTIMER for a single glucose reading. It is used to order chronologically the events along a hospital admission.
  7. STARTTIME: Timestamp that depicts when the administration of an insulin event started or when a new infusion rate was indicated.
  8. INPUT: Dose for a single bolus of insulin in U.
  9. INPUT_HRS: Insulin infusion rate in U/hr.
  10. ENDTIME: Timestamp that specifies when an insulin input stopped, or an infusion rate changed.
  11. INSULINTYPE: Acting type of insulin: short, intermediate, or long.
  12. EVENT: Specifies whether the bolus of insulin was subcutaneous (BOLUS_INYECTION), or intravenous (BOLUS_PUSH), or if the insulin was infused (INFUSION).
  13. INFXSTOP: Indicates when an infusion of insulin was discontinued. A value equal to 1 indicates when an infusion was discontinued, otherwise (entries equal to 0) this column indicates that the associated infusion started or the rate of infusion was modified. The ENDTIME when INFXSTOP = 1, indicates the timestamp when the infusion was discontinued; and the STARTTIME indicates when insulin started to be infused at a rate indicated in the INPUT_HRS column. However, the starting time can be either when the infusion started (ex. rate was constant along the total infusion time) or the infusion rate was changed compared to the previous entry. Total infusion time is the difference between the STARTTIME of the first infusion entry with an INFXSTOP = 0 and the ENDTIME of the subsequent INFXSTOP = 1 within an ICU stay. Thus, if a subsequent infusion entry appears after a INFXSTOP = 1 entry, means that an independent infusion started (ie. a gap without infusion).
  14. GLCTIMER: Timestamp that depicts when a glycemic check was done.
  15. GLC: Glycemia value in mg/dL.
  16. GLCSOURCE: Reading method for a glycemic check: fingerstick (FINGERSTICK) or lab analyzer (BLOOD).

Specific to glucose_insulin_pair.csv

  1. GLCTIMER_AL: Timestamp that depicts when a glycemic check was done for a paired glucose reading. This value should match with the timestamp in GLCTIMER of a preceding glucose reading according to the rule applied for this pairing case.
  2. GLC_AL: Glycemia value in mg/dL for a paired glucose reading with a single insulin input. This value should match with the value in GLC of a preceding glucose reading according to the rule applied for this pairing case.
  3. GLCSOURCE_AL: Reading method for a glycemic check that was paired with an insulin input. This value should match with the GLCSOURCE value of a preceding glucose reading according to the rule applied for this pairing case.
  4. RULE:Rule applied for pairing a single insulin input with a preceding glucose reading. Refer to the methods for further details.
  5. Repeated:Indicates whether the associated glucose reading in this entry was paired with a subsequent insulin input charted in this table. These entries aid to identify and verify which glucose readings were paired. The users have the option to remove this entry in case readability improves for their own purpose.

Usage Notes

The code to build the subsets referred in this work from MIMIC tables is available in this project. The code for curation is contained in the 1_0_ara_curation_I.ipnyb notebook, while 2_0_ara_pairing_II.ipynb contains the instructions for pairing. In the related paper [9] and in the latter notebook, the analysis focused only in the boluses of short-acting insulin because this is the most common form of insulin dosing in the data. Also, because this type of medication is the first choice recommendation in the guidelines for hospitalized patients [8]. Lastly, intermediate and long acting insulin administrations may be guided by estimates of basal glucose control (e.x. glycemia before meal and patient's weight) [10].

The two JUPYTER notebooks can be run either online with Google’s Colaboratory or locally (user is responsible of installing all required modules and depencies). In addition to that, researchers should have access to the Google Cloud Service (BigQuery) where the MIMIC project is hosted to run the queries. In case researchers want to run the queries in other database management systems, they may have to modify the syntax as needed.

A related paper [9] performed a detailed analysis of the blood glucose readings. This analysis was performed in MATLAB and the related material is shared as well in this project. The Live Script file is named Glucose_Analysis.mlx. Also, an statistical analysis on the pairing rules was performed in MATLAB. The associated Live Script is named Pairing.mlx. As an alternative, html copies of these Live Scripts were uploaded as well. The functions used in these files are located in the Functions folder. For running the material created in MATLAB, users require at least version R2019b.

The SQL queries to obtain the files admissions.xlsx, SOFA.xlsx, and diabetes.xlsx are detailed in the Glucose_Analysis.mlx file. The files BolusesCUR.xlsx, BolusesCUR_60.xlsx, and BolusesCUR_nr.xlsx that are imported in the Pairing.mlx file are created in the 2_0_ara_pairing_II.ipynb notebook.

It is the hope of the authors that this curated dataset will be used to further investigate trends in how glucose is managed in the ICU. For instance, if any medical outcome (ex. mortality) is associated with the administration of insulin.

Release Notes

1.0.0 This is the first release of this database.

1.0.1 The associated publication for this dataset was added in the references and referred in the main text (Reference 9). The "Rule 5" subtitle was added in Methods section to match with the information stated in the referred publication. No changes to the files and dataset.


The work of ARA was supported by the PhD fellowship PD/BD/114107/2015 from Fundação da Ciência e da Tecnologia (FCT). Also through IDMEC, under LAETA, project UIDB/50022/2020; also, by the European Regional Development Fund (LISBOA-01-0145-FEDER-031474) and FCT through Programa Operacional Regional de Lisboa (PTDC/EME-SIS/31474/2017). The Medical Information Mart for Intensive Care is funded by the National Institute of Health through the NIBIB grant R01 EB017205. The data in MIMIC-III has been de-identified, and the institutional review boards of the Massachusetts Institute of Technology (No. 0403000206) and Beth Israel Deaconess Medical Center (2001-P-001699/14) both approved the use of the database for research. Also we acknowledge the help of Lawrence Baker, Jason H. Maley, Francis De Michelle, João Sousa, Susana Vieira, Stan Finkelstein and Jesse Raffa.

Conflicts of Interest

The authors declare no competing financial interests.


  1. American Diabetes Association. 8. Pharmacologic Approaches to Glycemic Treatment: Standards of Medical Care in Diabetes—2018. Diabetes Care 41, S73–S85 (2018)
  2. American Diabetes Association. 15. Diabetes Care in the Hospital: Standards of Medical Care in Diabetes—2020. Diabetes Care 43, S193–S202 (2020)
  3. Kitabchi, A. E., Umpierrez, G. E., Miles, J. M. & Fisher, J. N. Hyperglycemic Crises in Adult Patients With Diabetes. Diabetes Care 32, 1335–1343 (2009)
  4. Laposata, M. CHAPTER 2: Methods. Laposata’s Laboratory Medicine The Diagnosis of Disease in the Clinical Laboratory (McGraw-Hill Education LLC., 2019).
  5. U.S. Food & Drug Administration. Real-World Evidence [Internet]. [cited 2020 Nov 3]. Available from:
  6. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-World Evidence — What Is It and What Can It Tell Us? N Engl J Med. 2016 Dec 8;375(23):2293–7
  7. Corrigan-Curay J, Sacks L, Woodcock J. Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness. JAMA. 2018 Sep 4;320(9):867.
  8. Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, (2016).
  9. Robles Arévalo, A., Maley, J.H., Baker, L. et al. Data-driven curation process for describing the blood glucose management in the intensive care unit. Sci Data 8, 80 (2021).
  10. Baker L, Maley JH, Arévalo A, DeMichele F, Mateo-Collado R, Finkelstein S, et al. Real-world characterization of blood glucose control and insulin use in the intensive care unit. Sci Rep. 2020;10(1):10718.

Parent Projects
Curated Data for Describing Blood Glucose Management in the Intensive Care Unit was derived from: Please cite them when using this project.

Access Policy:
Only PhysioNet credentialed users who sign the specified DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Corresponding Author
You must be logged in to view the contact information.