Resources


Database Credentialed Access

Medical-CXR-VQA dataset: A Large-Scale LLM-Enhanced Medical Dataset for Visual Question Answering on Chest X-Ray Images

Xinyue Hu, Lin Gu, Kazuma Kobayashi, et al.

Medical-CXR-VQA provides a large-scale LLM-enhanced dataset for visual question answering in medical chest x-ray images.

Published: Jan. 21, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-CXR-QBA: A Structured, Tagged, and Localized Visual Question Answering Dataset with Question-Box-Answer Triplets and Scene Graphs for Chest X-ray Images

Philip Müller, Friederike Jungmann, Georgios Kaissis, et al.

We present a large-scale CXR VQA dataset derived from MIMIC-CXR with 42M QA pairs, featuring hierarchical answers, bounding boxes, and structured tags. We generated QA-pairs using LLM-based extraction from radiology reports and localization models.

chest x-rays vqa localization scene graphs

Published: July 22, 2025. Version: 1.0.0


Database Credentialed Access

Medical-Diff-VQA: A Large-Scale Medical Dataset for Difference Visual Question Answering on Chest X-Ray Images

Xinyue Hu, Lin Gu, Qiyuan An, et al.

MIMIC-Diff-VQA provides a large-scale dataset for Difference visual question answering in medical chest x-ray images.

difference visual question answering difference vqa vqa chest x-ray visual question answering

Published: Feb. 3, 2025. Version: 1.0.1


Database Credentialed Access

RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports

Sarvesh Soni, Kirk Roberts

RadQA is an electronic health record question answering dataset containing clinical questions that can be answered using the Findings and Impressions sections of radiology reports

machine reading comprehension radiology reports question answering clinical notes electronic health records

Published: Dec. 9, 2022. Version: 1.0.0


Database Credentialed Access

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Jayetri Bardhan, Anthony Colas, Kirk Roberts, et al.

DrugEHRQA is a QA dataset containing question-answers from MIMIC-III tables and discharge summaries.

question-answer qa

Published: April 12, 2022. Version: 1.0.0


Database Credentialed Access

Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database

Xiang Yue, Xinliang Frederick Zhang, Huan Sun

Annotated Question Answering Pairs for Clinical Notes in the MIMIC-III Database

clinical question answering clinical nlp clinical reading comprehension

Published: Jan. 15, 2021. Version: 1.0.0


Database Open Access

Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management

Cécile Logé, Emily Ross, David Yaw Amoah Dadey, et al.

Q-Pain, a medical QA dataset designed to enable the substitution of multiple different racial and gender "profiles" for patients and to evaluate whether bias is present when deciding whether to prescribe pain medication or not.

Published: June 11, 2021. Version: 1.0.0


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, et al.

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering machine learning electronic health records evaluation chest x-ray multi-modal question answering ehr question answering semantic parsing deep learning benchmark visual question answering

Published: July 23, 2024. Version: 1.0.0


Database Credentialed Access

MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters

Amin Dada, Osman Alperen Koras, Marie Bauer, et al.

MeDiSumQA is a dataset of patient-oriented QA pairs from MIMIC-IV discharge summaries, designed to evaluate LLMs in generating safe, patient-friendly medical responses for clinical QA and healthcare communication.

Published: May 5, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, et al.

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering machine learning electronic health records evaluation chest x-ray radiology deep learning benchmark multimodal visual question answering

Published: July 19, 2024. Version: 1.0.0