Resources


Database Credentialed Access

Chest ImaGenome Dataset

Joy Wu, Nkechinyere Agu, Ismini Lourentzou, Arjun Sharma, Joseph Paguio, Jasper Seth Yao, Edward Christopher Dee, William Mitchell, Satyananda Kashyap, Andrea Giovannini, Leo Anthony Celi, Tanveer Syeda-Mahmood, Mehdi Moradi

The Chest ImaGenome dataset is a scene graph dataset with additional chronological comparison relations for chest X-rays. It is automatically derived from the MIMIC-CXR dataset. A manually annotated gold standard is also available for 500 patients.

multimodal radiology chest x-ray machine learning scene graph visual question answering visual dialogue object detection disease progression semantic reasoning bounding box relation extraction knowledge graph explainability reasoning chest cxr deep learning

Published: July 13, 2021. Version: 1.0.0


Database Contributor Review

BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Henrique Dias, Ana Helena Dias Pereira dos Ulbrich

Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.

prescriptions exams tertiary care natural language processing clinical notes

Published: July 14, 2022. Version: 1.1


Database Credentialed Access

MIMIC-IV-Note: Deidentified free-text clinical notes

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark

Deidentified free-text clinical notes for patients in the MIMIC-IV Clinical Database.

mimic deidentification critical care electronic health record natural language processing clinical notes

Published: Jan. 6, 2023. Version: 2.2


Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports.

mimic computer vision radiology chest x-rays machine learning natural language processing

Published: Sept. 19, 2019. Version: 2.0.0


Database Credentialed Access

RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain

Pavel Blinov, Aleksandr Nesterov, Galina Zubkova, Arina Reshetnikova, Vladimir Kokh, Chaitanya Shivade

RuMedNLI is the full counterpart dataset of MedNLI in Russian language.

natural language inference recognizing textual entailment russian language

Published: April 1, 2022. Version: 1.0.0


Database Credentialed Access

CXR-PRO: MIMIC-CXR with Prior References Omitted

Vignav Ramesh, Nathan Chi, Pranav Rajpurkar

CXR-PRO is an adaptation of the MIMIC-CXR dataset (consisting of chest radiographs and their associated free-text radiology reports) with references to non-existent priors removed.

generation large language models free-text radiology reports references to priors retrieval

Published: Nov. 23, 2022. Version: 1.0.0


Database Credentialed Access

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du Nguyen Duong, Tan Bui, Pierre Chambon, Matthew Lungren, Andrew Ng, Curtis Langlotz, Pranav Rajpurkar

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports, which are obtained using a novel information extraction (IE) schema to capture clinically relevant information in a radiology report.

radiology entity and relation extraction graph multi-modal natural language processing

Published: June 3, 2021. Version: 1.0.0


Database Credentialed Access

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Xiaoyun Zhao, Cong Wang, Xin Chen, Zhong Liu, Caineng Pan, Mengke Li, Yingfeng Zheng, Yizhi Liu, Flora Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang

Benchmark dataset for report generation based on fundus fluorescein angiography images and reports.

fundus fluorescein angiography explainable and reliable evaluation vision and language medical report generation

Published: Sept. 21, 2021. Version: 1.0.0


Database Credentialed Access

RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports

Sarvesh Soni, Kirk Roberts

RadQA is an electronic health record question answering dataset containing clinical questions that can be answered using the Findings and Impressions sections of radiology reports

electronic health records question answering radiology reports machine reading comprehension clinical notes

Published: Dec. 9, 2022. Version: 1.0.0


Database Credentialed Access

MIMIC-CXR-JPG - chest radiographs with structured labels

Alistair Johnson, Matt Lungren, Yifan Peng, Zhiyong Lu, Roger Mark, Seth Berkowitz, Steven Horng

Chest x-rays in JPG format with structured labels derived from the associated radiology report.

mimic computer vision radiology chest x-ray deep learning

Published: Nov. 14, 2019. Version: 2.0.0