Resources


Database Credentialed Access

Maternal fat ultrasound measurement and nutritional assessment during pregnancy: A dataset centered in gestational outcomes

Alexandre da Silva Rocha, Juliana Rombaldi Bernardi, Alice Schoffel, et al.

Dataset collected as part of a prospective study in which abdominal maternal fat tissue measurements were compared with outcomes during hospitalization for labor and delivery.

pregnancy ultrasound abdominal

Published: Dec. 4, 2020. Version: 1.0.0


Database Restricted Access

Kinematic dataset of actors expressing emotions

Mingming Zhang, Lu Yu, Keye Zhang, et al.

1402 kinematic recordings of twenty-two semi-professional actors expressing emotions such as happiness, sadness, anger, fear, disgust, and surprise.

body movement emotion motion capture kinematic data

Published: July 7, 2020. Version: 2.1.0


Database Open Access

VTaC: A Benchmark Dataset of Ventricular Tachycardia Alarms from ICU Monitors

Li-wei Lehman, Benjamin Moody, Lucas McCullum, et al.

VTaC is an annotated ventricular tachycardia (VT) arrhythmia alarm database containing over 5,000 waveform recordings with VT alarms from ICU monitors, with each alarm labeled as either true or false by at least two human expert annotators.

arrhythmia machine learning icu false alarms benchmark dataset ventricular tachycardia

Published: Oct. 1, 2024. Version: 1.0

Visualize waveforms

Database Credentialed Access

Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

Nicholas Kuo, Simon Finfer, Louisa Jorm, et al.

This repository hosts the original Health Gym datasets of Acute Hypotension and Sepsis

sepsis acute hypotension synthetic dataset generative modelling wasserstein generative adversarial network reinforcement learning machine learning

Published: Feb. 23, 2022. Version: 1.0.0


Database Credentialed Access

Bridge2AI-Voice Pediatric Dataset

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, et al.

A dataset of questionnaire responses, spectrograms, and other information for pediatric participants collected for the Bridge2AI voice as a biomarker of health project.

voice bridge2ai

Published: Dec. 17, 2025. Version: 1.0.0


Database Contributor Review

ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room

Mel Molina, Nikita Mehandru, Niloufar Golchini, et al.

The ER-REASON dataset is a longitudinal collection of 25,174 de-identified clinical notes for 3,437 patients admitted to the emergency room (ER) at a large academic medical center between March 1, 2022, and March 31, 2024.

Published: Oct. 23, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-ECHO-Ext-MIMICEchoQA: A Benchmark Dataset for Echocardiogram-Based Visual Question Answering

Rahul Thapa, Andrew Li, Qingyang Wu, et al.

We present MIMICEchoQA, a benchmark dataset for echocardiogram-based question answering, built from the publicly available MIMIC-IV-ECHO database.

Published: Oct. 7, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp

Jing Wang, Xing Niu, Tong Zhang, et al.

It is a time series clinical events dataset with concrete temporal information. The dataset consists of 22,588,586 clinical events and related timestamps from 267,284 discharge summaries of the MIMIC-IV-Note.

mimic clinical event annotation time series temporal annotation

Published: Sept. 29, 2025. Version: 1.0.0


Database Credentialed Access

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Jean-Benoit Delbrouck

RadGraph-XL is a large, expert-annotated dataset of 2,300 radiology reports covering multiple modalities and anatomies. It enables accurate extraction of clinical entities and relations for downstream medical AI tasks.

Published: Sept. 12, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples

Zhenbang Wu, Anant Dadu, Mike Nalls, et al.

This dataset contains 450K open-ended instruction-following examples generated using GPT-3.5 based on the MIMIC-IV EHR database.

large language models medical question answering instruction tuning

Published: Sept. 9, 2025. Version: 1.0.0