Database Credentialed Access
MIMIC-III-Ext-Notes
Darren Liu , Monique Bouvier , Delgersuren Bold , Craig Jabaley , Michael Young , Wenhui Zhang , Mark Wainwright , Eric Rosenthal , Laurie Dimisko , Soojin Park , Gilles Clermont , Xiao Hu
Published: Feb. 27, 2026. Version: 1.0.0
When using this resource, please cite:
Liu, D., Bouvier, M., Bold, D., Jabaley, C., Young, M., Zhang, W., Wainwright, M., Rosenthal, E., Dimisko, L., Park, S., Clermont, G., & Hu, X. (2026). MIMIC-III-Ext-Notes (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/9tfx-yx07
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
Unstructured clinical documentation, such as progress notes, contains rich contextual information critical for clinical decision-making but remains underutilized in computational research due to the limited availability of annotated datasets. The MIMIC-III-Ext-Notes dataset was developed to address this gap by providing a resource for evaluating large language models (LLMs) and other natural language processing (NLP) systems in extracting and contextualizing clinical information.
The dataset includes 150 clinical notes randomly sampled from the MIMIC-III Clinical Database, from which 2,288 clinical concepts were identified using MetaMap and annotated by clinicians for detection accuracy, encounter relevance, and negation status. The resulting dataset enables the evaluation of models’ abilities to recognize, interpret, and reason about symptom mentions and disease concepts within realistic clinical narratives. By incorporating both concept-level and contextual annotations, MIMIC-III-Ext-Notes provides a valuable benchmark for developing, testing, and validating NLP and LLM frameworks designed for clinical text understanding and decision support applications.
Background
Clinical decision-making represents a critical component of patient care and is informed by a diverse array of data sources, including vital signs, diagnostic test results, patient history, and physical examination findings. In clinical practice, clinicians identify and treat disease states by integrating these complex data, such as the timing of a symptom’s onset or the presence or absence of specific clinical biomarkers. Unstructured clinical documentation, such as progress notes, may summarize these data and/or be the primary record of others, such as physical examination findings. Furthermore, unstructured clinical documentation is an important source of information about clinician thought processes and reasoning. These details are crucial for accurate diagnosis and treatment planning at the bedside and when evaluating or applying machine learning approaches to clinical problems.
In recent years, general large language models (LLMs) have exhibited human-level natural language processing capabilities. These models can generate high-quality outputs through text-based prompting alone, substantially lowering the barrier to performing sophisticated text analyses. Within healthcare, the potential applications of LLMs are vast and transformative [1-4]. However, there are relatively few annotated unstructured datasets that capture the contextual nuances of extracted clinical information, which poses significant challenges for their real-world implementation.
To advance these efforts, we created a dataset derived from clinical notes in the MIMIC-III database to evaluate LLM workflows for identifying patient symptoms and their associated contextual information. Specifically, this dataset was designed to assess the ability of LLMs to not only recognize symptom mentions but also to interpret their surrounding clinical context, including temporality and negation status. By incorporating these contextual elements, the dataset provides a more comprehensive framework for evaluating LLMs in realistic clinical scenarios.
Methods
We randomly sampled 150 clinical notes coded as nursing notes from the MIMIC-III Clinical Database [5]. Clinical concepts were initially extracted from these notes using MetaMap [6], a widely used biomedical text processing tool. Specifically, we targeted three semantic groups: diseases or syndromes, signs or symptoms, and mental or behavioral dysfunctions. These categories were selected because such information is frequently underrepresented or absent in structured electronic health record (EHR) data, yet they are critical for understanding patient conditions and clinical decision-making.
MetaMap identified 2,288 clinical concepts that belong to the three selected semantic types from the 150 clinical notes. Each concept was then annotated by at least two clinicians regarding the following three labeling questions:
- Is the concept correctly detected?
- Is the correctly detected concept being dealt with in the current encounter?
- Does a correctly detected condition need to be negated (i.e., has the condition already been treated)?
After the annotation process, all concepts with inter-annotator disagreement were systematically adjudicated through a consensus adjudication process. A physician and a nursing researcher reviewed each disputed concept and deliberated to determine the final adjudication, ensuring the accuracy, consistency, and reliability of the finalized annotations.
Data Description
The dataset is organized in comma-separated value (CSV) format, with all files centered around the unique identifier row_id, which corresponds to each clinical note.
notes.csv – This file includes 150 clinical notes sampled from the MIMIC-III Clinical Database. Each row represents a single note and includes the following fields:
- text
(string): Full text of the clinical note - row_id
(number): Unique identifier for each note - hadm_id
(number): Identifier for the hospital admission - subject_id
(number): Identifier for the patient
labels.csv – This file contains 2,288 clinical concepts extracted from the 150 clinical notes. Each row corresponds to an annotated concept and includes:
- row_id
(number): Identifier linking the concept to its source note - trigger_word
(string): The text span that triggered concept identification - concept
(string): The preferred name for the clinical concept - semtypes
("sosy" | "dsyn" | "mobd" | null): The semantic type of the concept - start, end
(number): Character-level start and end indices of the trigger word in the note text - detection, encounter, negation
(string): Annotation labels corresponding to concept presence, encounter relevance, and negation status- Detection: only contains "yes" or "no" values.
- Encounter: contains "yes", "no", or "-", with "-" indicating the detection label for this row is "no".
- Negation: contains "yes", "no", "unsure", or "-", with "-" indicating the detection label for this row is "no".
Usage Notes
The MIMIC-III-Ext-Notes dataset was developed to evaluate LLM and natural language processing (NLP) workflows for extracting and interpreting clinical information from unstructured text. The dataset provides a benchmark for assessing how well LLMs can identify clinical concepts and their contextual information, such as temporality and negation, within real-world clinical documentation.
Intended Uses:
This dataset is designed for use in studies involving clinical NLP, contextual information extraction, and automated reasoning in healthcare. Researchers can employ it to:
- Evaluate the performance of LLMs and NLP pipelines in detecting and classifying clinical concepts from unstructured text.
- Assess contextual understanding tasks such as negation detection, encounter relevance, and temporal interpretation at the note level.
- Develop and benchmarking annotation workflows, adjudication frameworks, or reproducible evaluation pipelines for clinical NLP.
- Support methodological studies on explainability, error analysis, and human–model comparison in clinical concept extraction.
Although the dataset focuses on unstructured clinical notes, it can be extended by integrating additional data modalities (e.g., structured EHR elements such as vital signs or medication lists) to test multimodal reasoning tasks. Likewise, researchers may adapt this dataset for disease-specific studies by selecting subsets of notes or concepts that meet predefined clinical inclusion criteria.
Limitations
Users should be aware of several important limitations of the dataset:
- Annotations are restricted to three semantic groups—diseases or syndromes (dsyn), signs or symptoms (sosy), and mental or behavioral dysfunctions (mobd)—as identified by MetaMap. Other clinically relevant concept types (e.g., procedures or medications) are not included.
- All annotations are limited to single-note context and do not capture longitudinal patient trajectories or condition evolution across multiple encounters.
- The dataset represents a sample of 150 notes, and results derived from it may not generalize to broader clinical corpora without further validation.
- Annotation labels reflect the specific guidelines and adjudication process used in this project and should not be interpreted as definitive clinical ground truth.
Overall, MIMIC-III-Ext-Notes serves as a foundational resource for evaluating model performance in contextual clinical understanding and for developing reproducible, explainable frameworks for unstructured medical text analysis.
Example code for loading the datasets:
import pandas as pd
# Load data
notes = pd.read_csv("notes.csv")
labels = pd.read_csv("labels.csv")
# Link concepts to their source notes using row_id
merged = labels.merge(notes, on="row_id", how="left")
# Inspect concepts associated with a specific note
note_concepts = merged[merged["row_id"] == 12345]
print(note_concepts[["trigger_word", "concept", "semtypes", "detection", "encounter", "negation"]])
Release Notes
Version 1.0.0: Initial release of the dataset.
Ethics
The MIMIC-III-Ext-Notes dataset is derived entirely from the publicly available MIMIC-III Clinical Database (version 1.4) hosted on PhysioNet. As such, no new patient data were collected. The original MIMIC-III dataset was created under institutional review board (IRB) approval and includes deidentified health records in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provisions.
All annotation and data processing were performed on deidentified text and did not involve any reidentification attempts or access to identifiable patient information. The dataset was developed in accordance with ethical principles for research using human data, emphasizing transparency, reproducibility, and privacy preservation.
Users of this dataset are expected to comply with the MIMIC-III Data Use Agreement (DUA) and applicable institutional review policies governing secondary use of deidentified clinical data. Any downstream use of this dataset should explicitly acknowledge its deidentified nature and ensure that no attempts are made to link data back to individual patients or clinicians.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Dai H, Liu Z, Liao W, Huang X, Cao Y, Wu Z, Zhao L, Xu S, Liu W, Liu N. AugGPT: Leveraging ChatGPT for text data augmentation. arXiv preprint arXiv:2302.13007; 2023.
- Qiu J, Li L, Sun J, Peng J, Shi P, Zhang R, Dong Y, Lam K, Lo FPW, Xiao B. Large AI models in health informatics: applications, challenges, and the future. IEEE J Biomed Health Inform. 2023.
- Yoon JH, Pinsky MR, Clermont G. Artificial intelligence in critical care medicine. Crit Care. 2022.
- Wang G, Yang G, Du Z, Fan L, Li X. ClinicalGPT: Large language models fine-tuned with diverse medical data and comprehensive evaluation. arXiv preprint arXiv:2306.09968; 2023.
- Johnson AEW, Pollard TJ, Mark R. MIMIC-III clinical database (version 1.4). PhysioNet. 2016. RRID:SCR_007345. doi:10.13026/C2XW26.
- Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001.
Parent Projects
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/9tfx-yx07
DOI (latest version):
https://doi.org/10.13026/2kv6-1s83
Project Views
2
Current Version2
All VersionsCorresponding Author
Versions
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project