Resources


Database Credentialed Access

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Jean-Benoit Delbrouck

RadGraph-XL is a large, expert-annotated dataset of 2,300 radiology reports covering multiple modalities and anatomies. It enables accurate extraction of clinical entities and relations for downstream medical AI tasks.

Published: Sept. 12, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples

Zhenbang Wu, Anant Dadu, Mike Nalls, et al.

This dataset contains 450K open-ended instruction-following examples generated using GPT-3.5 based on the MIMIC-IV EHR database.

large language models medical question answering instruction tuning

Published: Sept. 9, 2025. Version: 1.0.0


Database Credentialed Access

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Nidhi Soley, MaKhaila Bentil, Jash Shah, et al.

This project provides a manually annotated dataset of social determinants of health—social support, occupation, and substance use—linked to pregnancy outcomes, extracted from MIMIC-III and MIMIC-IV discharge summary notes.

Published: Aug. 4, 2025. Version: 1.0.0


Database Restricted Access

Swiss-Mammo: A physician-written, synthetic dataset of German mammography reports

Daniel Reichenpfader, Sandro von Däniken, Harald Marcel Bonel

Swiss-Mammo: A physician-written, synthetic dataset of 28 German mammography reports. The dataset is stratified based on BI-RADS categories and available in German and English.

radiology mammography structured reporting bi-rads

Published: June 24, 2025. Version: 1.0.1


Database Restricted Access

DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

Ke Wang, Jiamu Yang, Ayush Shetty, et al.

We present high resolution wearable device multichannel data along with clinical labeled and recorded sleep stage and polysomnography (PSG) data from 100 sleep abnormal patients with sleep apnea.

wearable sleep disorders biomedical time series classification

Published: April 30, 2025. Version: 2.1.0


Database Credentialed Access

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Yeonsu Kwon, Jiho Kim, Gyubok Lee, et al.

Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Published: March 19, 2025. Version: 1.0.1


Database Open Access

A Multimodal Dataset for Investigating Working Memory in Presence of Music

Saman Khazaei, Srinidhi Parshi, Samiul Alam, et al.

A multimodal dataset containing fNIRS data along with a wide range of physiological signals like EDA, HR, PPG, etc over the course of n-back experiments in presence of music.

Published: Feb. 26, 2025. Version: 1.0.0


Database Restricted Access

Visual Question Answering evaluation dataset for MIMIC CXR

Timo Kohlberger, Charles Lau, Tom Pollard, et al.

This dataset provides 224 VQAs for 40 test set cases, and 111 VQAs for 23 validation set cases of the MIMIC CXR dataset.

Published: Jan. 28, 2025. Version: 1.0.0


Database Open Access

CGMacros: a scientific dataset for personalized nutrition and diet monitoring

Ricardo Gutierrez-Osuna, David Kerr, Bobak Mortazavi, et al.

CGMacros contains information from two continuous glucose monitors (CGM), food macronutrients, food photographs, physical activity, and anonymized participant demographics, anthropometric measurements and health parameters.

diabetes machine learning continuous glucose monitors obesity postprandial glucose response food macronutrients metabolic models food photographs personalized nutrition

Published: Jan. 28, 2025. Version: 1.0.0


Database Credentialed Access

SCRIPT X2B8 Dataset: per-day clinical features to model successful next-day extubation

Sam Fenske, Alec Peltekian, Mengjia Kang, et al.

This dataset contains electronic health record (EHR) data from ICU patients receiving mechanical ventilation, aggregated on a daily basis, along with annotations of intubation, extubation, tracheostomy days, and cases of failed extubation. Data can b

Published: Jan. 28, 2025. Version: 1.0.0