Resources


Database Credentialed Access

Chest ImaGenome Dataset

Joy Wu, Nkechinyere Agu, Ismini Lourentzou, Arjun Sharma, Joseph Paguio, Jasper Seth Yao, Edward Christopher Dee, William Mitchell, Satyananda Kashyap, Andrea Giovannini, Leo Anthony Celi, Tanveer Syeda-Mahmood, Mehdi Moradi

The Chest ImaGenome dataset is a scene graph dataset with additional chronological comparison relations for chest X-rays. It is automatically derived from the MIMIC-CXR dataset. A manually annotated gold standard is also available for 500 patients.

multimodal radiology chest x-ray machine learning scene graph visual question answering visual dialogue object detection disease progression semantic reasoning bounding box relation extraction knowledge graph cxr explainability reasoning deep learning chest

Published: July 13, 2021. Version: 1.0.0


Database Credentialed Access

BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Henrique Dias, Ana Helena Dias Pereira dos Ulbrich

Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.

exams natural language processing tertiary care prescriptions clinical notes

Published: May 13, 2022. Version: 1.0


Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports.

mimic computer vision chest x-rays radiology machine learning natural language processing

Published: Sept. 19, 2019. Version: 2.0.0


Database Credentialed Access

RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain

Pavel Blinov, Aleksandr Nesterov, Galina Zubkova, Arina Reshetnikova, Vladimir Kokh, Chaitanya Shivade

RuMedNLI is the full counterpart dataset of MedNLI in Russian language.

natural language inference recognizing textual entailment russian language

Published: April 1, 2022. Version: 1.0.0


Database Credentialed Access

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du Nguyen Duong, Tan Bui, Pierre Chambon, Matthew Lungren, Andrew Ng, Curtis Langlotz, Pranav Rajpurkar

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports, which are obtained using a novel information extraction (IE) schema to capture clinically relevant information in a radiology report.

radiology entity and relation extraction graph multi-modal natural language processing

Published: June 3, 2021. Version: 1.0.0


Database Credentialed Access

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Xiaoyun Zhao, Cong Wang, Xin Chen, Zhong Liu, Caineng Pan, Mengke Li, Yingfeng Zheng, Yizhi Liu, Flora Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang

Benchmark dataset for report generation based on fundus fluorescein angiography images and reports.

fundus fluorescein angiography explainable and reliable evaluation vision and language medical report generation

Published: Sept. 21, 2021. Version: 1.0.0


Database Credentialed Access

MIMIC-CXR-JPG - chest radiographs with structured labels

Alistair Johnson, Matt Lungren, Yifan Peng, Zhiyong Lu, Roger Mark, Seth Berkowitz, Steven Horng

Chest x-rays in JPG format with structured labels derived from the associated radiology report.

mimic computer vision radiology chest x-ray deep learning

Published: Nov. 14, 2019. Version: 2.0.0


Database Credentialed Access

MedNLI for Shared Task at ACL BioNLP 2019

Chaitanya Shivade

Data for the MedNLI Shared Task at the 2019 ACL BioNLP 2019 Workshop on Biomedical Language Processing

mimic natural language inference recognizing textual entailment

Published: Nov. 28, 2019. Version: 1.0.1


Database Credentialed Access

MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Stephanie Hyland, Maria Teodora Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez Valle, Hoifung Poon, Ozan Oktay

MS-CXR is a new dataset containing 1162 Chest X-ray bounding box labels paired with radiology text descriptions, annotated and verified by two board-certified radiologists.

chest x-ray vision-language processing

Published: May 16, 2022. Version: 0.1


Database Credentialed Access

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Chaitanya Shivade

This is a resource for training machine learning models for language inference in the medical domain.

natural language inference recognizing textual entailment

Published: Oct. 1, 2019. Version: 1.0.0