Resources


Database Credentialed Access

MIMIC-IV-Ext-CLIF: MIMIC-IV in the Common Longitudinal ICU data Format (CLIF)

Zewei Liao, Shan Guleria, Kevin Smith, et al.

Transforming the MIMIC-IV 3.1 database into the Common Longitudinal ICU data Format (CLIF)

critical care mimic clif the common longitudinal icu data format

Published: March 23, 2026. Version: 1.1.0


Database Credentialed Access

MIMIC-IV-Ext-PE: Pulmonary Embolism Labels for CT Pulmonary Angiography Radiology Reports

Barbara Lam, Omid Jafari, Peiqi Wang, et al.

CTPA (computed tomography pulmonary angiogram) radiology reports from MIMIC-IV with pulmonary embolism (PE) adjudication

Published: March 23, 2026. Version: 1.0.0


Database Open Access

Respiratory and Pulse Oximetry Waveforms from Healthy Adults During Simulated Apnoea Events

Jordan Hill, Ella Frances Sophia Guy, Jaimey Anne Clifton, et al.

This dataset contains airway pressure, flow and pulse oximetry waveforms from 20 healthy adults during simulated apnoea events, including arterial and venous PPG signals for developing and validating OSA detection and oxygenation models.

pulse oximetry respiratory obstructive sleep apnea

Published: March 4, 2026. Version: 1.0.0


Database Open Access

Neurophysiological Dataset of Stress Resilience During Human-Computer Interaction

Shotabdi Roy, Joseph Nuamah

This dataset contains multimodal neurophysiological and physiological recordings collected from participants performing cognitively demanding tasks to study the temporal dynamics of stress resilience during human-computer interaction

Published: Feb. 27, 2026. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

question answering electronic health record patient portals clinicians

Published: Jan. 1, 2026. Version: 1.3


Database Credentialed Access

EchoGraph-annotated ECHO-NOTE2NUM examples

Chieh-Ju Chao, Mohammad Asadi

EchoGraph is a model that automatically extracts and structures clinical information from echocardiogram reports. The Annotated ECHO-NOTE2NUM Dataset contains MIMIC-III echo reports enhanced with EchoGraph annotations to enhance future research.

Published: Dec. 3, 2025. Version: 1.0.0


Database Contributor Review

InReDD-Dataset-PAN924

Caio Uehara Martins, Camila Tirapelli, Hugo Gaêta-Araujo, et al.

InReDD‑Dataset-V1 is a collection of 924 anonymised panoramic dental radiographs curated by the Interdisciplinary Research Group in Digital Dentistry (InReDD) at the University of São Paulo.

Published: Nov. 22, 2025. Version: 1.0.0


Database Credentialed Access

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Asad Aali, Vasiliki Bikia, Maya Varma, et al.

MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades.

Published: Nov. 14, 2025. Version: 1.0.1


Database Credentialed Access

PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

Daeun Kyung, Hyunseung Chung, Seongsu Bae, et al.

PatientSim is a patient simulator that simulates realistic and diverse personas for clinical scenarios, enabling robust training and evaluation of doctor-patient interactions in multi-turn dialogues.

electronic health records multi-turn dialogue llm simulation doctor-patient consultation

Published: Oct. 18, 2025. Version: 1.0.0


Database Restricted Access

TN-Mammo: A Multi-view Mammography Dataset for Breast Density Classification

Binh Nguyen, Cat Le, Loc Vu, et al.

We release the first version of TN-Mammo (June 2024), a mammogram dataset of 676 cases with breast density labels, providing high-quality data to support machine learning and early breast cancer detection.

Published: Oct. 4, 2025. Version: 1.0.0