Name: MIMIC-III-Ext-Notes
Published: Feb. 27, 2026
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Darren Liu , Monique Bouvier , Delgersuren Bold , Craig Jabaley , Michael Young , Wenhui Zhang , Mark Wainwright , Eric Rosenthal , Laurie Dimisko , Soojin Park , Gilles Clermont , Xiao Hu

Published: Feb. 27, 2026. Version: 1.0.0

When using this resource, please cite:
Liu, D., Bouvier, M., Bold, D., Jabaley, C., Young, M., Zhang, W., Wainwright, M., Rosenthal, E., Dimisko, L., Park, S., Clermont, G., & Hu, X. (2026). MIMIC-III-Ext-Notes (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/9tfx-yx07

MLA	Liu, Darren, et al. "MIMIC-III-Ext-Notes" (version 1.0.0). PhysioNet (2026). RRID:SCR_007345. https://doi.org/10.13026/9tfx-yx07
APA	Liu, D., Bouvier, M., Bold, D., Jabaley, C., Young, M., Zhang, W., Wainwright, M., Rosenthal, E., Dimisko, L., Park, S., Clermont, G., & Hu, X. (2026). MIMIC-III-Ext-Notes (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/9tfx-yx07
Chicago	Liu, Darren, Bouvier, Monique, Bold, Delgersuren, Jabaley, Craig, Young, Michael, Zhang, Wenhui, Wainwright, Mark, Rosenthal, Eric, Dimisko, Laurie, Park, Soojin, Clermont, Gilles, and Xiao Hu. "MIMIC-III-Ext-Notes" (version 1.0.0). PhysioNet (2026). RRID:SCR_007345. https://doi.org/10.13026/9tfx-yx07
Harvard	Liu, D., Bouvier, M., Bold, D., Jabaley, C., Young, M., Zhang, W., Wainwright, M., Rosenthal, E., Dimisko, L., Park, S., Clermont, G., and Hu, X. (2026) 'MIMIC-III-Ext-Notes' (version 1.0.0), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/9tfx-yx07
Vancouver	Liu D, Bouvier M, Bold D, Jabaley C, Young M, Zhang W, Wainwright M, Rosenthal E, Dimisko L, Park S, Clermont G, Hu X. MIMIC-III-Ext-Notes (version 1.0.0). PhysioNet. 2026. RRID:SCR_007345. Available from: https://doi.org/10.13026/9tfx-yx07

BibTeX

@article{PhysioNet-mimic-iii-ext-notes-1.0.0,
  author = {Liu, Darren and Bouvier, Monique and Bold, Delgersuren and Jabaley, Craig and Young, Michael and Zhang, Wenhui and Wainwright, Mark and Rosenthal, Eric and Dimisko, Laurie and Park, Soojin and Clermont, Gilles and Hu, Xiao},
  title = {{MIMIC-III-Ext-Notes}},
  journal = {{PhysioNet}},
  year = {2026},
  month = feb,
  note = {Version 1.0.0},
  doi = {10.13026/9tfx-yx07},
  url = {https://doi.org/10.13026/9tfx-yx07}
}

Additionally, please cite the original publication:

Liu, Darren, et al. "Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes." arXiv preprint arXiv:2401.13588 (2024).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

Unstructured clinical documentation, such as progress notes, contains rich contextual information critical for clinical decision-making but remains underutilized in computational research due to the limited availability of annotated datasets. The MIMIC-III-Ext-Notes dataset was developed to address this gap by providing a resource for evaluating large language models (LLMs) and other natural language processing (NLP) systems in extracting and contextualizing clinical information.

The dataset includes 150 clinical notes randomly sampled from the MIMIC-III Clinical Database, from which 2,288 clinical concepts were identified using MetaMap and annotated by clinicians for detection accuracy, encounter relevance, and negation status. The resulting dataset enables the evaluation of models’ abilities to recognize, interpret, and reason about symptom mentions and disease concepts within realistic clinical narratives. By incorporating both concept-level and contextual annotations, MIMIC-III-Ext-Notes provides a valuable benchmark for developing, testing, and validating NLP and LLM frameworks designed for clinical text understanding and decision support applications.

Background

Clinical decision-making represents a critical component of patient care and is informed by a diverse array of data sources, including vital signs, diagnostic test results, patient history, and physical examination findings. In clinical practice, clinicians identify and treat disease states by integrating these complex data, such as the timing of a symptom’s onset or the presence or absence of specific clinical biomarkers. Unstructured clinical documentation, such as progress notes, may summarize these data and/or be the primary record of others, such as physical examination findings. Furthermore, unstructured clinical documentation is an important source of information about clinician thought processes and reasoning. These details are crucial for accurate diagnosis and treatment planning at the bedside and when evaluating or applying machine learning approaches to clinical problems.

In recent years, general large language models (LLMs) have exhibited human-level natural language processing capabilities. These models can generate high-quality outputs through text-based prompting alone, substantially lowering the barrier to performing sophisticated text analyses. Within healthcare, the potential applications of LLMs are vast and transformative [1-4]. However, there are relatively few annotated unstructured datasets that capture the contextual nuances of extracted clinical information, which poses significant challenges for their real-world implementation.

To advance these efforts, we created a dataset derived from clinical notes in the MIMIC-III database to evaluate LLM workflows for identifying patient symptoms and their associated contextual information. Specifically, this dataset was designed to assess the ability of LLMs to not only recognize symptom mentions but also to interpret their surrounding clinical context, including temporality and negation status. By incorporating these contextual elements, the dataset provides a more comprehensive framework for evaluating LLMs in realistic clinical scenarios.

Methods

We randomly sampled 150 clinical notes coded as nursing notes from the MIMIC-III Clinical Database [5]. Clinical concepts were initially extracted from these notes using MetaMap [6], a widely used biomedical text processing tool. Specifically, we targeted three semantic groups: diseases or syndromes, signs or symptoms, and mental or behavioral dysfunctions. These categories were selected because such information is frequently underrepresented or absent in structured electronic health record (EHR) data, yet they are critical for understanding patient conditions and clinical decision-making.

MetaMap identified 2,288 clinical concepts that belong to the three selected semantic types from the 150 clinical notes. Each concept was then annotated by at least two clinicians regarding the following three labeling questions:

Is the concept correctly detected?
Is the correctly detected concept being dealt with in the current encounter?
Does a correctly detected condition need to be negated (i.e., has the condition already been treated)?

After the annotation process, all concepts with inter-annotator disagreement were systematically adjudicated through a consensus adjudication process. A physician and a nursing researcher reviewed each disputed concept and deliberated to determine the final adjudication, ensuring the accuracy, consistency, and reliability of the finalized annotations.

Data Description

The dataset is organized in comma-separated value (CSV) format, with all files centered around the unique identifier row_id, which corresponds to each clinical note.

notes.csv – This file includes 150 clinical notes sampled from the MIMIC-III Clinical Database. Each row represents a single note and includes the following fields:

text (string): Full text of the clinical note
row_id (number): Unique identifier for each note
hadm_id (number): Identifier for the hospital admission
subject_id (number): Identifier for the patient

labels.csv – This file contains 2,288 clinical concepts extracted from the 150 clinical notes. Each row corresponds to an annotated concept and includes:

row_id (number): Identifier linking the concept to its source note
trigger_word (string): The text span that triggered concept identification
concept (string): The preferred name for the clinical concept
semtypes ("sosy" | "dsyn" | "mobd" | null): The semantic type of the concept
start, end (number): Character-level start and end indices of the trigger word in the note text
detection, encounter, negation (string): Annotation labels corresponding to concept presence, encounter relevance, and negation status
- Detection: only contains "yes" or "no" values.
- Encounter: contains "yes", "no", or "-", with "-" indicating the detection label for this row is "no".
- Negation: contains "yes", "no", "unsure", or "-", with "-" indicating the detection label for this row is "no".

Usage Notes

The MIMIC-III-Ext-Notes dataset was developed to evaluate LLM and natural language processing (NLP) workflows for extracting and interpreting clinical information from unstructured text. The dataset provides a benchmark for assessing how well LLMs can identify clinical concepts and their contextual information, such as temporality and negation, within real-world clinical documentation.

Intended Uses:

This dataset is designed for use in studies involving clinical NLP, contextual information extraction, and automated reasoning in healthcare. Researchers can employ it to:

Evaluate the performance of LLMs and NLP pipelines in detecting and classifying clinical concepts from unstructured text.
Assess contextual understanding tasks such as negation detection, encounter relevance, and temporal interpretation at the note level.
Develop and benchmarking annotation workflows, adjudication frameworks, or reproducible evaluation pipelines for clinical NLP.
Support methodological studies on explainability, error analysis, and human–model comparison in clinical concept extraction.

Although the dataset focuses on unstructured clinical notes, it can be extended by integrating additional data modalities (e.g., structured EHR elements such as vital signs or medication lists) to test multimodal reasoning tasks. Likewise, researchers may adapt this dataset for disease-specific studies by selecting subsets of notes or concepts that meet predefined clinical inclusion criteria.

Limitations

Users should be aware of several important limitations of the dataset:

Annotations are restricted to three semantic groups—diseases or syndromes (dsyn), signs or symptoms (sosy), and mental or behavioral dysfunctions (mobd)—as identified by MetaMap. Other clinically relevant concept types (e.g., procedures or medications) are not included.
All annotations are limited to single-note context and do not capture longitudinal patient trajectories or condition evolution across multiple encounters.
The dataset represents a sample of 150 notes, and results derived from it may not generalize to broader clinical corpora without further validation.
Annotation labels reflect the specific guidelines and adjudication process used in this project and should not be interpreted as definitive clinical ground truth.

Overall, MIMIC-III-Ext-Notes serves as a foundational resource for evaluating model performance in contextual clinical understanding and for developing reproducible, explainable frameworks for unstructured medical text analysis.

Example code for loading the datasets:

import pandas as pd

# Load data
notes = pd.read_csv("notes.csv")
labels = pd.read_csv("labels.csv")

# Link concepts to their source notes using row_id
merged = labels.merge(notes, on="row_id", how="left")

# Inspect concepts associated with a specific note
note_concepts = merged[merged["row_id"] == 12345]
print(note_concepts[["trigger_word", "concept", "semtypes", "detection", "encounter", "negation"]])

Release Notes

Version 1.0.0: Initial release of the dataset.

Ethics

The MIMIC-III-Ext-Notes dataset is derived entirely from the publicly available MIMIC-III Clinical Database (version 1.4) hosted on PhysioNet. As such, no new patient data were collected. The original MIMIC-III dataset was created under institutional review board (IRB) approval and includes deidentified health records in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provisions.

All annotation and data processing were performed on deidentified text and did not involve any reidentification attempts or access to identifiable patient information. The dataset was developed in accordance with ethical principles for research using human data, emphasizing transparency, reproducibility, and privacy preservation.

Users of this dataset are expected to comply with the MIMIC-III Data Use Agreement (DUA) and applicable institutional review policies governing secondary use of deidentified clinical data. Any downstream use of this dataset should explicitly acknowledge its deidentified nature and ensure that no attempts are made to link data back to individual patients or clinicians.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dai H, Liu Z, Liao W, Huang X, Cao Y, Wu Z, Zhao L, Xu S, Liu W, Liu N. AugGPT: Leveraging ChatGPT for text data augmentation. arXiv preprint arXiv:2302.13007; 2023.
Qiu J, Li L, Sun J, Peng J, Shi P, Zhang R, Dong Y, Lam K, Lo FPW, Xiao B. Large AI models in health informatics: applications, challenges, and the future. IEEE J Biomed Health Inform. 2023.
Yoon JH, Pinsky MR, Clermont G. Artificial intelligence in critical care medicine. Crit Care. 2022.
Wang G, Yang G, Du Z, Fan L, Li X. ClinicalGPT: Large language models fine-tuned with diverse medical data and comprehensive evaluation. arXiv preprint arXiv:2306.09968; 2023.
Johnson AEW, Pollard TJ, Mark R. MIMIC-III clinical database (version 1.4). PhysioNet. 2016. RRID:SCR_007345. doi:10.13026/C2XW26.
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001.