Resources


Database Credentialed Access

BRAX, a Brazilian labeled chest X-ray dataset

Eduardo Pontes Reis, Joselisa Paiva, Maria Carolina Bueno da Silva, et al.

BRAX contains 24,959 chest radiography exams and 40,967 images acquired in a large general Brazilian hospital. All images have been read by trained radiologists and 14 labels were derived from Brazilian Portuguese reports using NLP.

chest x-ray dataset artificial intelligence

Published: June 17, 2022. Version: 1.1.0


Database Restricted Access

VinDr-PCXR: An open, large-scale pediatric chest X-ray dataset for interpretation of common thoracic diseases

Hieu Huy Pham, Tien Thanh Tran, Ha Quy Nguyen

An open, large-scale pediatric chest X-ray dataset that contains both lesion-level labels and image-level labels for multiple findings and diseases for interpretation of common thoracic diseases.

Published: March 21, 2022. Version: 1.0.0


Database Restricted Access

VinDr-Mammo: A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography

Hieu Huy Pham, Hieu Nguyen Trung, Ha Quy Nguyen

A large-scale benchmark dataset for computer-aided detection and diagnosis in mammography

Published: March 21, 2022. Version: 1.0.0


Database Credentialed Access

VinDr-CXR: An open dataset of chest X-rays with radiologist annotations

Ha Quy Nguyen, Hieu Huy Pham, le tuan linh, et al.

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

lesion detection chest x-ray interpretation disease classification computer vision deep learning

Published: June 22, 2021. Version: 1.0.0


Database Restricted Access

Smartphone-Captured Chest X-Ray Photographs

Po-Chih Kuo, ChengChe Tsai, Diego M Lopez, et al.

Smartphone-captured CXR images including photographs taken from MIMIC-CXR and CheXpert, photographs taken by resident doctors, and photographs taken with different devices.

smartphone photograph cxr

Published: Sept. 27, 2020. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-ECHO: Echocardiogram Matched Subset

Brian Gow, Tom Pollard, Nathaniel Greenbaum, et al.

The MIMIC-IV-ECHO module contains structured measurements from over 200,000 echocardiograms and more than 500,000 echocardiogram DICOM files. Patients overlap with those in the MIMIC-IV Clinical Database.

Published: March 10, 2026. Version: 1.0


Database Credentialed Access

Bridge2AI-Voice Pediatric Dataset

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, et al.

A dataset of questionnaire responses, spectrograms, and other information for pediatric participants collected for the Bridge2AI voice as a biomarker of health project.

voice bridge2ai

Published: Dec. 17, 2025. Version: 1.0.0


Database Credentialed Access

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, et al.

A dataset of features from voice recordings and metadata to enable the development, benchmarking, and validation of clinically applicable machine-learning models for diagnosing a wide range of health conditions.

voice bridge2ai

Published: Dec. 16, 2025. Version: 3.0.0


Database Credentialed Access

MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context

Zishan Gu, Jiayuan Chen, Fenglin Liu, et al.

MedVH provides a visual hallucination evaluation benchmark for large language models in the medical context. It formulates tests using chest X-ray images, including multi-choice question answering and long-text generation tasks.

Published: Dec. 10, 2025. Version: 1.0.1


Database Contributor Review

ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room

Mel Molina, Nikita Mehandru, Niloufar Golchini, et al.

The ER-REASON dataset is a longitudinal collection of 25,174 de-identified clinical notes for 3,437 patients admitted to the emergency room (ER) at a large academic medical center between March 1, 2022, and March 31, 2024.

Published: Oct. 23, 2025. Version: 1.0.0