Resources


Database Open Access

VTaC: A Benchmark Dataset of Ventricular Tachycardia Alarms from ICU Monitors

Li-wei Lehman, Benjamin Moody, Lucas McCullum, Hasan Saeed, Harsh Deep, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, Roger Mark

VTaC is an annotated ventricular tachycardia (VT) arrhythmia alarm database containing over 5,000 waveform recordings with VT alarms from ICU monitors, with each alarm labeled as either true or false by at least two human expert annotators.

arrhythmia machine learning icu false alarms benchmark dataset ventricular tachycardia

Published: Oct. 1, 2024. Version: 1.0

Visualize waveforms

Database Credentialed Access

Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

Nicholas Kuo, Simon Finfer, Louisa Jorm, Sebastiano Barbieri

This repository hosts the original Health Gym datasets of Acute Hypotension and Sepsis

sepsis acute hypotension synthetic dataset generative modelling wasserstein generative adversarial network reinforcement learning machine learning

Published: Feb. 23, 2022. Version: 1.0.0


Database Contributor Review

ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room

Mel Molina, Nikita Mehandru, Niloufar Golchini, Ahmed Alaa

The ER-REASON dataset is a longitudinal collection of 25,174 de-identified clinical notes for 3,437 patients admitted to the emergency room (ER) at a large academic medical center between March 1, 2022, and March 31, 2024.

Published: Oct. 23, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-ECHO-Ext-MIMICEchoQA: A Benchmark Dataset for Echocardiogram-Based Visual Question Answering

Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder-Rodriguez, Angela Zhang, David Ouyang, James Zou

We present MIMICEchoQA, a benchmark dataset for echocardiogram-based question answering, built from the publicly available MIMIC-IV-ECHO database.

Published: Oct. 7, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp

Jing Wang, Xing Niu, Tong Zhang, Jie Shen, Juyong Kim, Jeremy Weiss

It is a time series clinical events dataset with concrete temporal information. The dataset consists of 22,588,586 clinical events and related timestamps from 267,284 discharge summaries of the MIMIC-IV-Note.

mimic clinical event annotation time series temporal annotation

Published: Sept. 29, 2025. Version: 1.0.0


Database Credentialed Access

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Jean-Benoit Delbrouck

RadGraph-XL is a large, expert-annotated dataset of 2,300 radiology reports covering multiple modalities and anatomies. It enables accurate extraction of clinical entities and relations for downstream medical AI tasks.

Published: Sept. 12, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples

Zhenbang Wu, Anant Dadu, Mike Nalls, Faraz Faghri, Jimeng Sun

This dataset contains 450K open-ended instruction-following examples generated using GPT-3.5 based on the MIMIC-IV EHR database.

large language models medical question answering instruction tuning

Published: Sept. 9, 2025. Version: 1.0.0


Database Credentialed Access

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, Olivier Elemento, Maria Powell, David Dorr, Philip Payne, Vardit Ravitsky, Jean-Christophe Bélisle-Pipon, Alistair Johnson, Ruth Bahr, Stephanie Watts, Donald Bolser, Jennifer Siu, Jordan Lerner-Ellis, Frank Rudzicz, Micah Boyer, Samantha Salvi Cruz, Yassmeen Abdel-Aty, Toufeeq Ahmed Syed, James Anibal, Stephen Aradi, Ana Sophia Martinez, Shaheen Awan, Steven Bedrick, Alexander Bernier, Isaac Bevers, Rahul Brito, Selina Casalino, John Costello, Iris De Santiago, Enrique Diaz-Ocampo, Mohamed Ebraheem, Ellie Eiseman, Mahmoud Elmahdy, Emily Evangelista, Kenneth Fletcher, Hortense Gallois, Alexander Gelbard, Anna Goldenberg, Karim Hanna, William Hersh, Lochana Jayachandran, Kaley Jenney, Kathy Jenkins, Stacy Jo, Ayush Kalia, Andrea Krussel, Elisa Lapadula, Chloe Loewith, Radhika Mahajan, Vrishni Maharaj, Siyu Miao, Matthew Mifsud, Marian Mikhael, Elijah Moothedan, Yosef Nafii, Tempestt Neal, Karlee Newberry, Evan Ng, Christopher Nickel, Megan Urbano, Trevor Pharr, Matthew Pontell, Claire Premi-Bortolotto, JM Rahman, Sarah Rohde, Laurie Russell, Suketu Shah, Ahmed Shawkat, Elizabeth Silberholz, Duncan Sutherland, Venkata Swarna Mukhi, Jeffrey Tang, Jamie Toghranegar, Kimberly Vinson, Claire Wilson, Madeleine Zanin, Xijie Zeng, Theresa Zesiewicz, Robin Zhao, Pantelis Zisimopoulos, Satrajit Ghosh

A dataset of features from voice recordings and metadata to enable the development, benchmarking, and validation of clinically applicable machine-learning models for diagnosing a wide range of health conditions.

voice bridge2ai

Published: Aug. 18, 2025. Version: 2.0.1


Database Credentialed Access

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Nidhi Soley, MaKhaila Bentil, Jash Shah, Masoud Rouhizadeh, Casey Taylor

This project provides a manually annotated dataset of social determinants of health—social support, occupation, and substance use—linked to pregnancy outcomes, extracted from MIMIC-III and MIMIC-IV discharge summary notes.

Published: Aug. 4, 2025. Version: 1.0.0


Database Restricted Access

Swiss-Mammo: A physician-written, synthetic dataset of German mammography reports

Daniel Reichenpfader, Sandro von Däniken, Harald Marcel Bonel

Swiss-Mammo: A physician-written, synthetic dataset of 28 German mammography reports. The dataset is stratified based on BI-RADS categories and available in German and English.

radiology mammography structured reporting bi-rads

Published: June 24, 2025. Version: 1.0.1