PhysioNet Index

Database Credentialed Access

Medical Expert Annotations of Unsupported Facts in Doctor-Written and LLM-Generated Patient Summaries

Stefan Hegselmann, Shannon Shen, Florian Gierse, et al.

Annotations for unsupported facts in 100 original MIMIC patient summaries (discharge instructions) and hallucinations in 100 Large Language Model (LLM) generated patient summaries labeled by two medical experts.

Published: April 30, 2025. Version: 1.0.1

Database Restricted Access

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Chihcheng Hsieh, Chun Ouyang, Jacinto C Nascimento, et al.

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Published: March 23, 2023. Version: 1.0.0

Challenge Credentialed Access

Discharge Me: BioNLP ACL'24 Shared Task on Streamlining Discharge Documentation

Justin Xu

Data for the "Discharge Me!" Shared Task on Streamlining Discharge Documentation for BioNLP ACL'24

generation bionlp acl discharge summary

Published: April 12, 2024. Version: 1.3

Database Credentialed Access

Northwestern ICU (NWICU) database

Dana Moukheiber, William Temps, Bhadrappa Molgi, et al.

A freely available COVID-rich ICU database comprising de-identified health-related data from Northwestern Memorial Health Center (NHMC).

Published: Nov. 19, 2024. Version: 0.1.0

Database Credentialed Access

BOLD, a blood-gas and oximetry linked dataset

João Matos, Tristan Struja, Jack Gallifant, et al.

An open-source pulse oximetry and arterial blood gas dataset, derived from MIMIC-III, MIMIC-IV, and eICU-CRD

pulse oximetry intensive care unit electronic health records health equity

Published: Nov. 8, 2023. Version: 1.0

Database Credentialed Access

PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

Daeun Kyung, Hyunseung Chung, Seongsu Bae, et al.

PatientSim is a patient simulator that simulates realistic and diverse personas for clinical scenarios, enabling robust training and evaluation of doctor-patient interactions in multi-turn dialogues.

electronic health records multi-turn dialogue llm simulation doctor-patient consultation

Published: Oct. 18, 2025. Version: 1.0.0

Database Credentialed Access

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Nidhi Soley, MaKhaila Bentil, Jash Shah, et al.

This project provides a manually annotated dataset of social determinants of health—social support, occupation, and substance use—linked to pregnancy outcomes, extracted from MIMIC-III and MIMIC-IV discharge summary notes.

Published: Aug. 4, 2025. Version: 1.0.0

Model Credentialed Access

Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research

Kenneth Kehl, Pavel Trukhanov, Christopher Fong, et al.

The DFCI-imaging-student and DFCI-medonc-student AI models for extracting cancer outcomes from imaging reports and medical oncologist notes from electronic health records.

Published: Oct. 24, 2024. Version: 1.0.0

Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, et al.

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering electronic health records evaluation chest x-ray multi-modal question answering ehr question answering semantic parsing benchmark visual question answering deep learning machine learning

Published: July 23, 2024. Version: 1.0.0

Database Credentialed Access

Insulin4RL: Real-Time Insulin Infusions For Offline Reinforcement Learning

Thomas Frost, Steve Harris

Openly available research dataset intended for offline reinforcement learning (ORL) using natively irregular healthcare data. The dataset is intended to encourage further research into ORL methods using naturally sporadic decision intervals.

insulin intensive care semi-markov decision process diabetes blood glucose offline reinforcement learning machine learning

Published: June 15, 2026. Version: 1.0.0

Search

Resources

Medical Expert Annotations of Unsupported Facts in Doctor-Written and LLM-Generated Patient Summaries

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Discharge Me: BioNLP ACL'24 Shared Task on Streamlining Discharge Documentation

Northwestern ICU (NWICU) database

BOLD, a blood-gas and oximetry linked dataset

PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Insulin4RL: Real-Time Insulin Infusions For Offline Reinforcement Learning