Resources


Database Credentialed Access

Structured Viewing Classification Annotations From the MIMIC-IV-ECHO Dataset (ECHOVIEW)

Sampath Rapuri, Sofia Sapeta Dias, Maria Salomé Carvalho, et al.

ECHOVIEW provides structured viewing class annotations for 29,196 transthoracic echocardiograms derived from MIMIC-IV-ECHO using a pretrained CNN. Manual clinician review shows substantial agreement (κ=0.69) with these annotations.

Published: March 17, 2026. Version: 0.1


Database Open Access

Hillel Yaffe Glaucoma Dataset (HYGD): A Gold-Standard Annotated Fundus Dataset for Glaucoma Detection

Or Abramovich, Hadas Pizem, Jonathan Fhima, et al.

HYGD is a rigorously annotated fundus image dataset with gold-standard clinical labels designed to improve and benchmark deep learning models for accurate glaucoma detection.

ophthalmology retina glaucoma dfi gon fundus gold-standard

Published: March 16, 2026. Version: 1.1.0


Database Credentialed Access

MIMIC-III-Ext-Notes

Darren Liu, Monique Bouvier, Delgersuren Bold, et al.

We evaluated general large language models' performance in clinical information extraction on MIMIC-III notes.

Published: Feb. 27, 2026. Version: 1.0.0


Challenge Credentialed Access

SNOMED CT Entity Linking Challenge

Will Hardman, Mark Banks, Rory Davidson, et al.

272 discharge notes from the MIMIC-IV-Note dataset annotated with SNOMED CT concepts.

snomed entity linking clinical annotation

Published: Feb. 17, 2026. Version: 1.2.1


Database Credentialed Access

Predictors of Hospital Onset Infection: A Matched Retrospective Cohort Dataset

Ziming Wei, Luke Sagers, Caroline McKenna, et al.

NPA-CP is a freely accessible dataset derived from electronic health record (EHR) information at MGB between 2015 and 2024. The dataset includes 11 different pathogens and can be used to predict hospital-onset infections for these pathogens.

electronic health records infection control clinical machine learning infectious diseases hospital onset infection colonization pressure

Published: Nov. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext clinical decision support for referral, triage and diagnosis

Farieda Gaber, Altuna Akalin

This MIMIC-IV extended dataset is designed to evaluate and improve LLMs' ability to assist with triage, specialist referral, and diagnosis, using critical patient information such as history of present illness,vitals signs and other relevant data.

Published: Oct. 8, 2025. Version: 1.0.2


Database Open Access

MIMIC-IV demo data in the Medical Event Data Standard (MEDS)

Robin Philippus van de Water, Ethan Steinberg, Michael Wornow, et al.

MIMIC-IV Clinical Database Demo in MEDS (Medical Event Data Standard) format.

ehr critical care electronic health record machine learning mimic meds medical event data standard

Published: Sept. 29, 2025. Version: 0.0.1


Database Credentialed Access

MIMIC-Ext-DrugDetection

Fabrice Harel-Canada, Nanyun Peng, David Goodman, et al.

This project offers a multilabel annotated dataset of clinical note sentences from MIMIC-III/IV for substance use detection. It supports NLP research for identifying various co-occurring drug use mentions in patient records.

ehr mimic-iv substance use clinical notes mimic-iii methamphetamine multi-label cocaine drug detection polysubstance use prescription opioid misuse cannabis benzodiazepine misuse injection drug use heroin

Published: Sept. 25, 2025. Version: 1.0.0


Database Credentialed Access

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Nidhi Soley, MaKhaila Bentil, Jash Shah, et al.

This project provides a manually annotated dataset of social determinants of health—social support, occupation, and substance use—linked to pregnancy outcomes, extracted from MIMIC-III and MIMIC-IV discharge summary notes.

Published: Aug. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-CXR-QBA: A Structured, Tagged, and Localized Visual Question Answering Dataset with Question-Box-Answer Triplets and Scene Graphs for Chest X-ray Images

Philip Müller, Friederike Jungmann, Georgios Kaissis, et al.

We present a large-scale CXR VQA dataset derived from MIMIC-CXR with 42M QA pairs, featuring hierarchical answers, bounding boxes, and structured tags. We generated QA-pairs using LLM-based extraction from radiology reports and localization models.

chest x-rays vqa localization scene graphs

Published: July 22, 2025. Version: 1.0.0