Resources


Database Credentialed Access

MIMIC-IV-Ext clinical decision support for referral, triage and diagnosis

Farieda Gaber, Altuna Akalin

This MIMIC-IV extended dataset is designed to evaluate and improve LLMs' ability to assist with triage, specialist referral, and diagnosis, using critical patient information such as history of present illness,vitals signs and other relevant data.

Published: Oct. 8, 2025. Version: 1.0.2


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, et al.

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence natural language processing clinical notes electronic health records large language models brief hospital course long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

question answering electronic health record patient portals clinicians

Published: Jan. 1, 2026. Version: 1.3


Challenge Credentialed Access

ArchEHR-QA: A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

question answering electronic health record patient portals clinicians

Published: Jan. 1, 2026. Version: 1.3


Database Credentialed Access

EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, et al.

An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Published: June 26, 2024. Version: 1.0.1


Database Contributor Review

ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room

Mel Molina, Nikita Mehandru, Niloufar Golchini, et al.

The ER-REASON dataset is a longitudinal collection of 25,174 de-identified clinical notes for 3,437 patients admitted to the emergency room (ER) at a large academic medical center between March 1, 2022, and March 31, 2024.

Published: Oct. 23, 2025. Version: 1.0.0


Database Credentialed Access

PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

Daeun Kyung, Hyunseung Chung, Seongsu Bae, et al.

PatientSim is a patient simulator that simulates realistic and diverse personas for clinical scenarios, enabling robust training and evaluation of doctor-patient interactions in multi-turn dialogues.

electronic health records multi-turn dialogue llm simulation doctor-patient consultation

Published: Oct. 18, 2025. Version: 1.0.0


Database Credentialed Access

SCRIPT CarpeDiem Dataset: demographics, outcomes, and per-day clinical parameters for critically ill patients with suspected pneumonia

Nikolay Markov, Catherine A Gao, Thomas Stoeger, et al.

SCRIPT seeks to delineate the host/pathogen interactions during pneumonia using multiomic analysis of bronchoalveolar lavage fluid joined with clinical data and physician adjudication.

Published: Aug. 4, 2025. Version: 1.8.0


Database Credentialed Access

C-REACT: Contextualized Race and Ethnicity Annotations for Clinical Text

Oliver Bear Don't Walk IV, Adrienne Pichon, Harry Reyes Nieva, et al.

Two sets of gold-standard annotations for race and ethnicity information from clinical notes in MIMIC-III. Contains race and ethnicity label assignments and related information such as country of origin and spoken language.

clinical notes patient country information race and ethnicity patient language information

Published: Oct. 21, 2024. Version: 1.0.0


Database Credentialed Access

ENCoDE, mEasuring skiN Color to correct pulse Oximetry DisparitiEs: skin tone and clinical data from a prospective trial on acute care patients.

Sicheng Hao, Katelyn Dempsey, João Matos, et al.

A prospective collected EHR-linked skin tone measurements database in OMOP format with emphasis on pulse oximetry disparities.

Published: Aug. 22, 2024. Version: 1.0.0