Resources


Database Credentialed Access

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Jayetri Bardhan, Anthony Colas, Kirk Roberts, Daisy Zhe Wang

DrugEHRQA is a QA dataset containing question-answers from MIMIC-III tables and discharge summaries.

question-answer qa

Published: April 12, 2022. Version: 1.0.0


Database Credentialed Access

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Published: March 19, 2025. Version: 1.0.1


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering chest x-ray benchmark evaluation multi-modal question answering ehr question answering semantic parsing machine learning deep learning electronic health records visual question answering

Published: July 23, 2024. Version: 1.0.0


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering chest x-ray benchmark evaluation multi-modal question answering ehr question answering semantic parsing machine learning deep learning electronic health records visual question answering

Published: July 23, 2024. Version: 1.0.0


Model Credentialed Access

Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research

Kenneth Kehl, Pavel Trukhanov, Christopher Fong, Justin Jee, Karl Pichotta, Morgan Paul, Chelsea Nichols, Michele Waters, Nikolaus Schultz, Deborah Schrag

The DFCI-imaging-student and DFCI-medonc-student AI models for extracting cancer outcomes from imaging reports and medical oncologist notes from electronic health records.

Published: Oct. 24, 2024. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: BioNLP at ACL 2025 Shared Task on Grounded Electronic Health Record Question Answering

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

electronic health record question answering clinicians patient portals

Published: April 11, 2025. Version: 1.2


Database Credentialed Access

AIPatient KG: MIMIC-III and CORAL Electronic Health Records based Patient Knowledge Graph

Lizhou Fan, Huizi Yu

This project integrates MIMIC-III and CORAL electronic health records into knowledge graphs to improve medical analysis and enhance decision-making capabilities. Resources include two knowledge graph snapshots and two question-and-answering datasets.

Published: April 15, 2025. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: BioNLP at ACL 2025 Shared Task on Grounded Electronic Health Record Question Answering

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

electronic health record question answering clinicians patient portals

Published: April 11, 2025. Version: 1.2


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, Artur Conesa, Elisa Asensio, Antonio Lopez-Rueda, Helena Arino, Elena Calvo, Maria Jesús Bertran, Maria Angeles Marcos, Montserrat Nofre Maiz, Laura Tañá Velasco, Antonia Marti, Ricardo Farreres, Xavier Pastor, Xavier Borrat Frigola, Martin Krallinger

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, Yeasul Kim, Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel badih El Ariss, Marc Ghanem, David Seong, Andrew Lee, Caitlin Coombes, Brad Bradshaw, Mahir Sufian, Hyo Jung Hong, Teresa Nguyen, Mohammad Rasouli, Komal Kamra, Mark Burbridge, James McAvoy, Roya Saffary, Stephen Parnell Ma, Dev Dash, James Xie, Ellen Wang, Cliff Schmiesing, Nigam Shah, Nima Aghaeepour

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence clinical notes natural language processing large language models brief hospital course electronic health records long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0