Database Credentialed Access
GOSSIS-1-eICU, the eICU-CRD subset of the Global Open Source Severity of Illness Score (GOSSIS-1) dataset
Jesse Raffa , Alistair Johnson , Tom Pollard , Omar Badawi
Published: July 20, 2022. Version: 1.0.0
When using this resource, please cite:
(show more options)
Raffa, J., Johnson, A., Pollard, T., & Badawi, O. (2022). GOSSIS-1-eICU, the eICU-CRD subset of the Global Open Source Severity of Illness Score (GOSSIS-1) dataset (version 1.0.0). PhysioNet. https://doi.org/10.13026/gbmg-a531.
Raffa JD, Johnson AEW, O’Brien Z, Pollard TJ, Mark RG, Celi LA, et al. The Global Open Source Severity of Illness Score (GOSSIS). Crit Care Med. :10.1097/CCM.0000000000005518.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
GOSSIS-1 is a modern, free, open-source in-hospital mortality prediction algorithm for critical care patients, achieving excellent discrimination and calibration across three countries (Australia, New Zealand and the USA). GOSSIS-1 was developed on two large datasets of critical care patients. This project contains the USA subset of patients derived from the eICU Collaborative Research Database (eICU-CRD). The dataset, which we call GOSSIS-1-eICU, consists of 131,051 unique patients from 204 hospitals from ICU admissions discharged in 2014-15. The code to create the dataset from eICU-CRD and generate GOSSIS-1 predictions are also available. This project contains: 1) the derived dataset from eICU-CRD, 2) the dataset with required missing data imputed and 3) the GOSSIS-1 in-hospital predictions (probabilities. The
hospitalid eICU-CRD identifiers are included to allowing linking back to eICU-CRD. Training and test sets are identified to allow for direct comparisons of performance.
GOSSIS-1  was developed as the first version of a series of global open-source severity of illness scores by the GOSSIS consortium . The consortium aims create a database of critical care datasets from ICUs around the globe and to use these datasets to develop a family of open-source scoring systems for assessing the severity of illness of critical care patients internationally. GOSSIS-1 was developed using data from two well-known datasets consisting of data from Australia and New Zealand via the ANZICS-APD dataset , and the USA via the eICU-CRD dataset . The GOSSIS-1 model achieved high discrimination and calibration in all countries and relevant subsets . This project contains the USA subset of data that was used to train the GOSSIS-1 model. The data originates from eICU-CRD and we call this dataset, GOSSIS-1-eICU.
The GOSSIS-1-eICU data were extracted from the eICU-CRD database . The eICU-CRD is a relational database consisting of about 200,000 ICU admissions from over 200 hospitals throughout the USA. Importantly, the GOSSIS-1-eICU data consists of critical care admissions from 2014-15, where the length of ICU stay was >6 hours. Data, including physiologic and vital signs were collected from the first 24 hours of the ICU stay. Readmissions to the ICU, patients <16 years old, and those with a missing outcome or with no heart rate recorded were excluded. The code  used to extract the GOSSIS-1-eICU dataset from eICU-CRD is available on GitHub. Further details about the extraction can be found in the GOSSIS-1 paper  and the paper’s supplementary materials. Further details about eICU-CRD can be found in its data description and on the eICU-CRD website .
This project contains three data files:
Each dataset includes the
patientunitstayid identifier which allows linking back to eICU-CRD. The datasets have 131,051 rows containing data (corresponding to the number of admissions), along with a header row.
The first file,
gossis-1-eicu-only.csv.gz, is a gzip compressed CSV file containing the features and outcomes from eICU-CRD patients used to train the GOSSIS-1 model. Each row of the CSV file corresponds with one ICU admission. The file includes a header specifying the variable names. A data specification file,
variable-definitions.yaml is also included, specifying the valid values, ranges, along with short descriptions of each variable by their name. The same information is largely contained in Supplementary Tables 1 and 2 of the GOSSIS-1 paper (1). For the diagnosis variables,
apache_2_diagnosis there is a mapping file to define each of the codes called
apache_diagnosis_map.csv. This dataset has minimal data cleaning and contains 216 columns.
The second file,
gossis-1-eicu-only-model-ready.csv.gz is a gzip compressed CSV file containing the specific features and outcomes from eICU-CRD patients used to train the GOSSIS-1 model after preprocessing, and imputation. Importantly, most physiological
*_apache variables have been excluded, the Glasgow Coma Scale variables,
intubated_apache have been transformed, and all
d1_*_max variables have been transposed into the midpoint (
d1_*_avg) and range (
d1_*_diff). We have also indicated through the
partition variable whether the admission was in the training set (70%) or test set (30%).
The last file,
gossis-1-eicu-predictions.csv.gz, is a gzip compressed CSV file, containing only two columns –
gossis1_ihm_pred corresponding with the eICU-CRD identifier and the GOSSIS-1 in-hospital mortality predictions (probabilities). Please note, this dataset contains both training and test set patients. In the 39,318 test set patients, we have reported an AUROC of 0.904 (0.900–0.909), SMR of 0.992 (0.959–1.024) and a Brier score of 0.055. You can find the R package, rGOSSIS1, which generates GOSSIS-1 predictions on GitHub .
gossis-1-eicu-only.csv.gz: Generation of this dataset can be accomplished using code from the GOSSIS GitHub repository . The dataset contains missing data, several patient outcomes, and demographic variables used to assess model performance in subset (e.g., ethnicity) which were not used in the GOSSIS-1 model itself. This dataset is also available to approved users on BigQuery under the name
gossis-1-eicu-only-model-ready.csv.gz: This dataset is derived from
gossis-1-eicu-only.csv.gz, after running the
impute_data (using algorithm 3) and
prepare_fit functions in the rGOSSIS1 package . Extraneous columns which are not used in GOSSIS-1 predictions have been removed. This dataset can be fed into the GOSSIS-1 prediction function (
gpredict), or used to fit a new model. This dataset is also available to approved users on BigQuery under the name
gossis-1-eicu-predictions.csv.gz: This dataset is derived from running the
gpredict function on
gossis-1-eicu-only-model-ready.csv.gz. This dataset may be suitable for performance comparisons to other models. Alternatively,
gossis1_ihm_pred can be used as one would currently use the APACHE IVa in hospital mortality prediction column,
predictedhospitalmortality, in the
apachepatientresult table currently in eICU-CRD. This dataset is also available to approved users on BigQuery under the names
Version 1.0.0: This initial release corresponds with the publication  of “The Global Open Source Severity of Illness Score (GOSSIS)” in Critical Care Medicine.
This dataset is entirely derived from eICU-CRD. Within eICU-CRD, all tables are deidentified to meet the safe harbor provision of the US Health Insurance Portability and Accountability Act (HIPAA). These provisions include the removal of all protected health information. Hospital and unit identifiers have also been removed to protect the privacy of contributing organizations. The schema was established in collaboration with Privacert (Cambridge, MA), who certified the re-identification risk as meeting safe harbor standards (HIPAA Certification no. 1031219-2).
We wish to thank the GOSSIS consortium, Philips and ANZICS for all their help in developing these datasets and GOSSIS-1.
Conflicts of Interest
The authors have no conflicts of interest to declare.
- Raffa JD, Johnson AEW, O’Brien Z, Pollard TJ, Mark RG, Celi LA, et al. The Global Open Source Severity of Illness Score (GOSSIS). Crit Care Med. :10.1097/CCM.0000000000005518.
- GOSSIS: Global Open Source Severity of Illness Score: International Benchmarking for Critical Care [Internet]. [cited 2022 Jun 24]. Available from: https://gossis.mit.edu/
- Stow PJ, Hart GK, Higlett T, George C, Herkes R, McWilliam D, et al. Development and implementation of a high-quality clinical database: the Australian and New Zealand Intensive Care Society Adult Patient Database. J Crit Care. 2006 Jun;21(2):133–41.
- Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018 Dec;5(1):180178.
- GOSSIS: The Global Open Source Severity of Illness Score [Internet]. MIT Laboratory for Computational Physiology; 2022 [cited 2022 Jun 24]. Available from: https://github.com/MIT-LCP/gossis
- eICU [Internet]. [cited 2022 Jun 23]. Available from: https://eicu-crd.mit.edu/about/eicu/
- Raffa JD. rGOSSIS1 [Internet]. 2020 [cited 2022 Jun 24]. Available from: https://github.com/jraffa/rGOSSIS1
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
CITI Data or Specimens Only Research
icu critical care severity of illness global gossis apache mortality prediction benchmarking
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project