Resources


Database Restricted Access

LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays

Elham Ghelichkhan, Tolga Tasdizen

This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.

eye-tracking chest x-ray dataset automatically generated dataset caption-guided object detection localization image captioning with region-level description grounded radiology report generation phrase grounding xai multi-modal learning local visual-language models

Published: Feb. 4, 2025. Version: 1.0.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-DiReCT

Bowen Wang, Jiuyang Chang, Yiming Qian

A diagnostic reasoning dataset designed to evaluate the performance of large language models in aligning with human doctors when making diagnoses from clinical notes.

Published: Jan. 21, 2025. Version: 1.0.0


Database Restricted Access

A database of hand kinematics, high-density sEMG of forearm and wrist for motion intent recognition

Zeming Zhao, Weichao Guo, Zeyu Zhou

A database of hand kinematics, high-density sEMG of forearm and wrist.

Published: Jan. 17, 2025. Version: 1.0.0


Database Credentialed Access

MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Stephanie Hyland, Harshita Sharma, Maria Teodora Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez Valle, Hoifung Poon, Ozan Oktay

MS-CXR is a new dataset containing 1162 chest X-ray bounding box labels paired with radiology text descriptions, annotated and verified by two board-certified radiologists.

vision-language processing chest x-ray localization phrase grounding

Published: Nov. 15, 2024. Version: 1.1.0


Database Contributor Review

A multimodal dental dataset facilitating machine learning research and clinic services

Wenjing Liu, Yunyou Huang, Suqin Tang

A new dental dataset that contains 169 patients, three commonly used dental image models, and images of various health conditions of the oral cavity.

Published: Oct. 11, 2024. Version: 1.1.0


Database Open Access

VTaC: A Benchmark Dataset of Ventricular Tachycardia Alarms from ICU Monitors

Li-wei Lehman, Benjamin Moody, Lucas McCullum, Hasan Saeed, Harsh Deep, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, Roger Mark

VTaC is an annotated ventricular tachycardia (VT) arrhythmia alarm database containing over 5,000 waveform recordings with VT alarms from ICU monitors, with each alarm labeled as either true or false by at least two human expert annotators.

arrhythmia icu false alarms benchmark dataset ventricular tachycardia machine learning

Published: Oct. 1, 2024. Version: 1.0

Visualize waveforms

Database Credentialed Access

MIMIC-IV-Ext-MDS-ED: Multimodal Decision Support in the Emergency Department - a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine

Juan Miguel Lopez Alcaraz, Nils Strodthoff

MIMIC-IV-ext-MDS-ED proposes a dataset to benchmark multimodal decision support in the emergency department. It features multimodal input (including ECG waveforms) and a comprehensive set of prediction targets (diagnoses and deterioration prediction)

emergency department ecg benchmark diagnoses prediction deterioration prediction multimodal

Published: Sept. 12, 2024. Version: 1.0.0


Database Restricted Access

Multimodal Physiological Indices During Surgery Under Anesthesia

Sandya Subramanian, Bryan Tseng, Riccardo Barbieri, Emery Brown

Multimodal physiological indices collected during surgery when patients were under anesthesia

anesthesia nociception

Published: Aug. 23, 2024. Version: 1.0


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering benchmark evaluation multi-modal question answering deep learning ehr question answering semantic parsing chest x-ray electronic health records machine learning visual question answering

Published: July 23, 2024. Version: 1.0.0