Resources


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, et al.

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Credentialed Access

C-REACT: Contextualized Race and Ethnicity Annotations for Clinical Text

Oliver Bear Don't Walk IV, Adrienne Pichon, Harry Reyes Nieva, et al.

Two sets of gold-standard annotations for race and ethnicity information from clinical notes in MIMIC-III. Contains race and ethnicity label assignments and related information such as country of origin and spoken language.

clinical notes patient country information race and ethnicity patient language information

Published: Oct. 21, 2024. Version: 1.0.0


Database Credentialed Access

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Yeonsu Kwon, Jiho Kim, Gyubok Lee, et al.

Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Published: March 19, 2025. Version: 1.0.1


Database Restricted Access

MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, et al.

We created 23 questions resembling eligibility criteria from the apixaban clinical trial and evaluated them on a random sample of 100 patient notes from MIMIC-IV. We release the 2300 total question-answer pairs as a dataset here.

clinical q and a evaluation set clinical trial eligibility

Published: April 30, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-DiReCT

Bowen Wang, Jiuyang Chang, Yiming Qian

A diagnostic reasoning dataset designed to evaluate the performance of large language models in aligning with human doctors when making diagnoses from clinical notes.

Published: Jan. 21, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-III-Ext-Synthetic-Clinical-Trial-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, et al.

In our recent study, we used Llama-3.1-70B-Instruct to generate synthetic training examples resembling clinical trial eligibility criteria. We manually reviewed 1000 of these examples and release them here.

large language models synthetic data distillation clinical trial eligibility

Published: April 22, 2025. Version: 1.0.0


Database Credentialed Access

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Chaitanya Shivade

This is a resource for training machine learning models for language inference in the medical domain.

natural language inference recognizing textual entailment

Published: Oct. 1, 2019. Version: 1.0.0


Database Contributor Review

ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room

Mel Molina, Nikita Mehandru, Niloufar Golchini, et al.

The ER-REASON dataset is a longitudinal collection of 25,174 de-identified clinical notes for 3,437 patients admitted to the emergency room (ER) at a large academic medical center between March 1, 2022, and March 31, 2024.

Published: Oct. 23, 2025. Version: 1.0.0


Database Credentialed Access

MedDec: Medical Decisions for Discharge Summaries in the MIMIC-III Database

Mohamed Elgaar, Jiali Cheng, Nidhi Vakil, et al.

Annotations of ten types of medical decisions from discharge summaries in the MIMIC-III database.

natural language processing medical decisions span classification discharge summary mimic

Published: Oct. 16, 2024. Version: 1.0.0


Database Credentialed Access

Nosocomial Risk Datasets from MIMIC-III

Travis Goodwin

Text-based Longitudinal Data for Predicting Nosocomial Disease Risk as used by CANTRIP.

pressure injury risk prediction acute kidney injury anemia forecasting natural language processing deep learning

Published: Sept. 15, 2022. Version: 1.0