Resources


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering chest x-ray electronic health records radiology machine learning multimodal deep learning evaluation visual question answering benchmark

Published: July 19, 2024. Version: 1.0.0


Database Credentialed Access

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Jean-Benoit Delbrouck

RadGraph-XL is a large, expert-annotated dataset of 2,300 radiology reports covering multiple modalities and anatomies. It enables accurate extraction of clinical entities and relations for downstream medical AI tasks.

Published: Sept. 12, 2025. Version: 1.0.0


Database Restricted Access

Swiss-Mammo: A physician-written, synthetic dataset of German mammography reports

Daniel Reichenpfader, Sandro von Däniken, Harald Marcel Bonel

Swiss-Mammo: A physician-written, synthetic dataset of 28 German mammography reports. The dataset is stratified based on BI-RADS categories and available in German and English.

radiology mammography structured reporting bi-rads

Published: June 24, 2025. Version: 1.0.1


Database Credentialed Access

BRAX, a Brazilian labeled chest X-ray dataset

Eduardo Pontes Reis, Joselisa Paiva, Maria Carolina Bueno da Silva, Guilherme Alberto Sousa Ribeiro, Victor Fornasiero Paiva, Lucas Bulgarelli, Henrique Lee, Paulo Victor dos Santos, vanessa brito, Lucas Amaral, Gabriel Beraldo, Jorge Nebhan Haidar Filho, Gustavo Teles, Gilberto Szarf, Tom Pollard, Alistair Johnson, Leo Anthony Celi, Edson Amaro

BRAX contains 24,959 chest radiography exams and 40,967 images acquired in a large general Brazilian hospital. All images have been read by trained radiologists and 14 labels were derived from Brazilian Portuguese reports using NLP.

chest x-ray dataset artificial intelligence

Published: June 17, 2022. Version: 1.1.0


Database Restricted Access

Pulmonary Edema Severity Grades Based on MIMIC-CXR

Ruizhi Liao, Geeticka Chauhan, Polina Golland, Seth Berkowitz, Steven Horng

Pulmonary edema metadata and labels for MIMIC-CXR

Published: Feb. 9, 2021. Version: 1.0.1


Database Credentialed Access

CXR-PRO: MIMIC-CXR with Prior References Omitted

Vignav Ramesh, Nathan Chi, Pranav Rajpurkar

CXR-PRO is an adaptation of the MIMIC-CXR dataset (consisting of chest radiographs and their associated free-text radiology reports) with references to non-existent priors removed.

generation free-text radiology reports references to priors retrieval large language models

Published: Nov. 23, 2022. Version: 1.0.0


Database Open Access

Radiology Report Generation Models Evaluation Dataset For Chest X-rays (RadEvalX)

Amos Rubin Calamida, Farhad Nooralahzadeh, Morteza Rohanian, Mizuho Nishio, Koji Fujimoto, Michael Krauthammer

The RadEvalX is a publicly available dataset developed similarly to the ReXVal dataset. RedEvalX focuses on radiologist evaluations of errors found in automatically generated radiology reports.

Published: June 18, 2024. Version: 1.0.0


Database Credentialed Access

Chest X-ray Dataset with Lung Segmentation

Wimukthi Indeewara, Mahela Hennayake, Kasun Rathnayake, Thanuja Ambegoda, Dulani Meedeniya

CXLSeg dataset: Chest X-ray with Lung Segmentation, a comparatively large dataset of segmented Chest X-ray radiographs based on the MIMIC-CXR dataset. This contains segmentation results of 243,324 frontal view images and corresponding masks.

segmentation medical reports u-net chest radiographs mimic-cxr chest x-ray

Published: Feb. 8, 2023. Version: 1.0.0


Database Credentialed Access

LLaVA-Rad MIMIC-CXR Annotations

Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu-Hsin Wei, Tristan Naumann, Muhao Chen, Matthew Lungren, Akshay Chaudhari, Serena Yeung, Curtis Langlotz, Sheng Wang, Hoifung Poon

This dataset provides GPT-4 extracted sections of radiology reports from MIMIC-CXR, complementing rule-based section extractions with additional reports with findings, and removing references to priors from findings.

Published: Jan. 24, 2025. Version: 1.0.0


Database Credentialed Access

Chest ImaGenome Dataset

Joy Wu, Nkechinyere Agu, Ismini Lourentzou, Arjun Sharma, Joseph Paguio, Jasper Seth Yao, Edward Christopher Dee, William Mitchell, Satyananda Kashyap, Andrea Giovannini, Leo Anthony Celi, Tanveer Syeda-Mahmood, Mehdi Moradi

The Chest ImaGenome dataset is a scene graph dataset with additional chronological comparison relations for chest X-rays. It is automatically derived from the MIMIC-CXR dataset. A manually annotated gold standard is also available for 500 patients.

scene graph visual dialogue object detection semantic reasoning bounding box knowledge graph explainability reasoning relation extraction chest disease progression cxr chest x-ray radiology machine learning multimodal deep learning visual question answering

Published: July 13, 2021. Version: 1.0.0