Resources


Database Restricted Access

LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays

Elham Ghelichkhan, Tolga Tasdizen

This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.

eye-tracking chest x-ray dataset automatically generated dataset caption-guided object detection image captioning with region-level description grounded radiology report generation phrase grounding xai multi-modal learning local visual-language models localization

Published: Feb. 4, 2025. Version: 1.0.0


Database Contributor Review

COVID Data for Shared Learning (CDSL): A comprehensive, multimodal COVID-19 dataset from HM Hospitales

Álvaro Ritoré, Andreea M Oprescu, Alberto Estirado Bronchalo, Miguel Ángel Armengol de la Hoz

COVID Data for Shared Learning (CDSL) is a multimodal database comprising de-identified structured health data and radiological images from 4,479 patients with COVID-19, as a comprehensive toolkit for developing predictive models.

covid-19 multimodal database radiological images open data healthcare data machine learning and ai

Published: Oct. 25, 2024. Version: 1.0.0


Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports.

computer vision chest x-rays natural language processing radiology mimic machine learning

Published: July 23, 2024. Version: 2.1.0


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, Artur Conesa, Elisa Asensio, Antonio Lopez-Rueda, Helena Arino, Elena Calvo, Maria Jesús Bertran, Maria Angeles Marcos, Montserrat Nofre Maiz, Laura Tañá Velasco, Antonia Marti, Ricardo Farreres, Xavier Pastor, Xavier Borrat Frigola, Martin Krallinger

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, Joanne Teh, Leon Worth, Monica Slavin, karin thursky, Karin Verspoor

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Credentialed Access

MS-CXR-T: Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Shruthi Bannur, Stephanie Hyland, Qianchu Liu, Fernando Pérez-García, Max Ilse, Daniel Coelho de Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anton Schwaighofer, Maria Teodora Wetscherek, Hannah Richardson, Tristan Naumann, Javier Alvarez Valle, Ozan Oktay

The MS-CXR-T is a multimodal benchmark that enhances the MIMIC-CXR v2 dataset by including expert-verified annotations. Its goal is to evaluate biomedical visual-language processing models in terms of temporal semantics extracted from image and text.

disease progression cxr vision-language processing chest x-ray radiology multimodal

Published: March 17, 2023. Version: 1.0.0


Database Credentialed Access

Chest X-ray Dataset with Lung Segmentation

Wimukthi Indeewara, Mahela Hennayake, Kasun Rathnayake, Thanuja Ambegoda, Dulani Meedeniya

CXLSeg dataset: Chest X-ray with Lung Segmentation, a comparatively large dataset of segmented Chest X-ray radiographs based on the MIMIC-CXR dataset. This contains segmentation results of 243,324 frontal view images and corresponding masks.

segmentation medical reports u-net chest radiographs mimic-cxr chest x-ray

Published: Feb. 8, 2023. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Note: Deidentified free-text clinical notes

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark

Deidentified free-text clinical notes for patients in the MIMIC-IV Clinical Database.

deidentification critical care electronic health record natural language processing clinical notes mimic

Published: Jan. 6, 2023. Version: 2.2


Database Credentialed Access

Chest X-ray segmentation images based on MIMIC-CXR

Li-Ching Chen, Po-Chih Kuo, Ryan Wang, Judy Gichoya, Leo Anthony Celi

A chest x-rays segmentation dataset derived from MIMIC-CXR based on deep learning algorithm and human examination.

segmentation chest x-rays cxr

Published: Aug. 18, 2022. Version: 1.0.0


Database Credentialed Access

BRAX, a Brazilian labeled chest X-ray dataset

Eduardo Pontes Reis, Joselisa Paiva, Maria Carolina Bueno da Silva, Guilherme Alberto Sousa Ribeiro, Victor Fornasiero Paiva, Lucas Bulgarelli, Henrique Lee, Paulo Victor dos Santos, vanessa brito, Lucas Amaral, Gabriel Beraldo, Jorge Nebhan Haidar Filho, Gustavo Teles, Gilberto Szarf, Tom Pollard, Alistair Johnson, Leo Anthony Celi, Edson Amaro

BRAX contains 24,959 chest radiography exams and 40,967 images acquired in a large general Brazilian hospital. All images have been read by trained radiologists and 14 labels were derived from Brazilian Portuguese reports using NLP.

chest x-ray dataset artificial intelligence

Published: June 17, 2022. Version: 1.1.0