Resources


Database Contributor Review

HiRID, a high time-resolution ICU dataset

Martin Faltys, Marc Zimmermann, Xinrui Lyu, Matthias Hüser, Stephanie Hyland, Gunnar Rätsch, Tobias Merz

The HiRID database contains a large selection of all routinely collected data relating to patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU).

icu clinical intensive care high resolution critical care machine learning

Published: Feb. 18, 2021. Version: 1.1.1


Database Credentialed Access

AMR-UTI: Antimicrobial Resistance in Urinary Tract Infections

Michael Oberst, Soorajnath Boominathan, Helen Zhou, Sanjat Kanjilal, David Sontag

AMR-UTI is a freely accessible dataset, derived from electronic health record (EHR) information on over 100,000 urinary tract infections (UTI) treated at Massachusetts General Hospital and Brigham & Women's Hospital in Boston, MA, USA.

antibiotic resistance causal inference policy learning antimicrobial resistance urinary tract infection clinical decision support machine learning

Published: Nov. 4, 2020. Version: 1.0.0


Database Open Access

Electrocardiogram, skin conductance and respiration from spider-fearful individuals watching spider video clips

Frank R Ihmig, Antonio Gogeascoechea, Sarah Schäfer, Johanna Lass-Hennemann, Tanja Michael

Dataset used for development of an algorithm for on-line anxiety level detection from biosignals.

exposure therapy psychophysiology biosignals anxiety disorders wearable sensors anxiety level detection machine learning

Published: June 5, 2020. Version: 1.0.0


Database Credentialed Access

MIMIC-III Clinical Database

Alistair Johnson, Tom Pollard, Roger Mark

MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The databas…

clinical intensive care critical care natural language processing machine learning

Published: Sept. 4, 2016. Version: 1.4


Database Open Access

Apnea-ECG Database

Seventy ECG signals with expert-labelled apnea annotations and machine-generated QRS annotations.

apnea sleep multiparameter challenge ecg

Published: Feb. 10, 2000. Version: 1.0.0

Visualize waveforms

Database Restricted Access

EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs

Pierre Elias, Joshua Finer

EchoNext is a curated dataset of electrocardiograms (ECGs) paired with echocardiogram-confirmed structural heart disease labels, designed to support the development and validation of machine learning models.

heart failure clinical decision support artificial intelligence health equity ecg machine learning deep learning electrocardiogram aortic stenosis cardiovascular screening valvular heart disease digital health ai model deployment left ventricular dysfunction ai in healthcare population health transthoracic echocardiogram structural heart disease

Published: Sept. 16, 2025. Version: 1.1.0


Database Credentialed Access

MIMIC-IV-Ext Triage Instruction Corpus

Qingyang Shen, Quan Guo

MIMIC-IV-Ext Triage Instruction Corpus includes 9,629 ED triage cases organized by the five-level ESI, enabling LLMs to improve triage accuracy. It provides CSV data, generation prompts, expert validation samples, and SQL QC scripts.

nlp clinical decision support large language models machine learning emergency severity index emergency triage

Published: March 4, 2025. Version: 1.0.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Open Access

CGMacros: a scientific dataset for personalized nutrition and diet monitoring

Ricardo Gutierrez-Osuna, David Kerr, Bobak Mortazavi, Anurag Das

CGMacros contains information from two continuous glucose monitors (CGM), food macronutrients, food photographs, physical activity, and anonymized participant demographics, anthropometric measurements and health parameters.

diabetes continuous glucose monitors machine learning obesity postprandial glucose response food macronutrients metabolic models food photographs personalized nutrition

Published: Jan. 28, 2025. Version: 1.0.0


Database Contributor Review

COVID Data for Shared Learning (CDSL): A comprehensive, multimodal COVID-19 dataset from HM Hospitales

Álvaro Ritoré, Andreea M Oprescu, Alberto Estirado Bronchalo, Miguel Ángel Armengol de la Hoz

COVID Data for Shared Learning (CDSL) is a multimodal database comprising de-identified structured health data and radiological images from 4,479 patients with COVID-19, as a comprehensive toolkit for developing predictive models.

covid-19 multimodal database radiological images open data healthcare data machine learning and ai

Published: Oct. 25, 2024. Version: 1.0.0