Resources
Database Credentialed Access
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi
question answering machine learning evaluation chest x-ray multi-modal question answering ehr question answering semantic parsing benchmark electronic health records deep learning visual question answering
Published: July 23, 2024. Version: 1.0.0
Database Credentialed Access
CXR-Align: A Benchmark for CXR-Report Alignment with Negations
Hanbin Ko
Published: Aug. 21, 2025. Version: 1.0.0
Database Credentialed Access
EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems
Konstantin Kotschenreuther
mimic-iv clinical question-answering medical discharge summaries large language models
Published: Jan. 11, 2024. Version: 1.0.0
Database Credentialed Access
MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark
Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador Martinez, Eduardo Perez Guerrero, Paola Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy Zandee van Rilland, Poonam Hosamani, Kevin Keet, Minjoung Go, Evelyn Ling, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay Chaudhari
Published: Nov. 3, 2025. Version: 1.0.0
Database Credentialed Access
MIMIC-IV-Ext Clinical Decision Making: A MIMIC-IV Derived Dataset for Evaluation of Large Language Models on the Task of Clinical Decision Making for Abdominal Pathologies
Paul Hager, Friederike Jungmann, Daniel Rueckert
clinical decision making abdominal pathologies treatment plan emergency room diagnosis large language models
Published: July 8, 2024. Version: 1.1
Database Credentialed Access
MIMIC-III Clinical Database
Alistair Johnson, Tom Pollard, Roger Mark
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The databas…
clinical intensive care critical care natural language processing machine learning
Published: Sept. 4, 2016. Version: 1.4
Database Credentialed Access
Paediatric Intensive Care database
Haomin Li, Xian Zeng, Gang Yu
intensive care pediatrics critical care natural language processing
Published: Nov. 12, 2020. Version: 1.1.0
Database Credentialed Access
NCH Sleep DataBank: A Large Collection of Real-world Pediatric Sleep Studies with Longitudinal Clinical Data
Harlin Lee, Boyue Li, Yungui Huang, Yuejie Chi, Simon Lin
eeg ehr pediatrics polysomnography clinical decision support sleep study ecg sleep disorders electronic health records
Published: Oct. 27, 2021. Version: 3.1.0
Database Credentialed Access
MIMIC-CXR Database
Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng
computer vision chest x-rays natural language processing machine learning radiology mimic
Published: July 23, 2024. Version: 2.1.0
Database Credentialed Access
MIMIC-Ext-DrugDetection
Fabrice Harel-Canada, Nanyun Peng, David Goodman, Ruby Romero, Allan Nguyen, Brandon Moghanian, Anabel Salimian
ehr mimic-iv substance use clinical notes methamphetamine multi-label cocaine drug detection polysubstance use prescription opioid misuse cannabis benzodiazepine misuse injection drug use heroin mimic-iii
Published: Sept. 25, 2025. Version: 1.0.0