Database Credentialed Access

National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database

Jiayang Wang Xiaoshuo Huang Lin Yang Jiao Li

Published: Jan. 25, 2021. Version: 1.0.0

When using this resource, please cite: (show more options)
Wang, J., Huang, X., Yang, L., & Li, J. (2021). National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database (version 1.0.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


The National Institutes of Health Stroke Scale (NIHSS) is a 15-item neurologic examination stroke scale. It quantifies the physical manifestations of neurological deficits and provides crucial support for clinical decision making and early-stage emergency triage. NIHSS scores stored in the free-text of Electronic Health Records (EHRs) often lack standardization and the expression patterns are highly dependent on the habit of the clinicians. This can limit the potential for reusability of the data.

There is benefit in developing robust algorithms to extract NIHSS scores from the free-text of EHRs. We developed a dataset for NIHSS score identification, a task defined as the extraction of scale items and corresponding scores from discharge summaries. Discharge summaries of stroke cases in the Medical Information Mart for Intensive Care III (MIMIC-III) database were used to create an annotated NIHSS corpus.

Each discharge summary was manually annotated for the presence of NIHSS scores by two annotators with backgrounds in medical informatics. Annotations include all scale items (e.g. “4. Facial Palsy”), the corresponding score “measurement”, and their relation “has value”. The dataset is intended to support academic and industrial research in the field of medical natural language processing (NLP).


Electronic Health Record (EHR) data carries enormous amounts of medical treatment information that is not well structured, stored instead as free text. This data has great potential to provide insight into medical treatment and to facilitate retrospective studies. The National Institutes of Health Stroke Scale (NIHSS) quantifies the physical manifestations of neurological deficits and provides crucial support for clinical decision making and early-stage emergency triage, and it is often recorded within free text fields.

NIHSS quantifies 15 items: “1a. Level of Consciousness (LOC)”, “1b. LOC Questions”, “1c. LOC Commands”, “2. Best Gaze”, “3. Visual”, “4. Facial Palsy”, “5a. Left Arm”, “5b. Right Arm”, “6a. Left Leg”, “6b. Right Leg”, “7. Limb Ataxia”, “8. Sensory”, “9. Best Language”, “10. Dysarthria” and “11. Extinction and Inattention”. Each item can be marked with a score based on the responses of a patient to certain queries. The scoring system is widely used for the initial assessment of the severity of stroke, to assess treatment response, and for bedside monitoring. NIHSS is a commonly recorded item in EHR data and it has been used in many stroke-related studies [1-3].

MIMIC-III is an open-access database that contains 58976 intensive care patients’ records from 2001 to 2012 [4]. The discharge summaries in the NOTEEVENTS table store information from hospital admission to discharge; including examinations, medications, procedures, and other treatment records. Different from other structured data in MIMIC-III, discharge summaries are stored as free text and correlated with patient IDs. The summaries include detailed notes relating to medical history, physical examinations, ECG / X-ray / MRI findings, therapeutic regimes, discharge medicine, and other unstructured information, including NIHSS. We sought to develop an approach for extracting NIHSS scores from unstructured patient notes.


Following approaches of Woodfield et al [5] and Mitchell et al [6], we set our research scope to cover all 4 majority types of stroke: ischemic stroke, hemorrhagic stroke, subarachnoid hemorrhage (SAH), and intracerebral hemorrhage (ICH), mapping to ICD-9 codes 430, 431, 432, 433, 434, 436 and their subtypes. In the end, 3660 stroke cases with valid discharge summaries were selected for further annotation.

Based on the NIHSS structure and common recording convention, we developed a primary annotation guideline. Two expert annotators with backgrounds in medical informatics were recruited for the study. The experts first annotated the same 100 discharge summaries individually using BRAT, a widely used online text annotation tool that allows multiple people to work simultaneously [7]. The graphical user interface allows annotators to label entities (NIHSS item) and relations (item - score) by selecting and dragging, it then generates position/relation records accordingly. After the first round of annotation, the discrepancies were discussed between annotators and the guideline were adjusted correspondingly. Third parties were consulted in cases of disagreement. This process was repeated until the guideline reached a stable state. At this point, Cohen’s Kappa [8] of inter-annotator agreement reached 0.901 which suggested sufficient consistency according to Landis and Koch [9].

Then the two annotators continued to finish the annotation process following the final guideline. The annotated results were saved in “.ann” files. The original Discharge Summary text was then combined with these annotation files to create a dictionary that contained HADM_ID (patient’s admission ID), token (separated word from discharge summary), tags (Begin-Inside-Outside tag), relations (entity-entity relation), entities (annotator recognized NIHSS item) and code (entity or not).

Data Description

Our corpus contains data for 312 stroke patients with 2929 NIHSS items, 2774 measurements, and 2733 item-score relations. The corpus was separate into a training set: NER_RE_Train.txt (220 discharge summaries) and a testing set: NER_RE_Test.txt (92 discharge summaries). These training and testing sets can be used for NIHSS entity recognition and relation recognition. RE_End2End_Test.txt is an end-to-end relation test set, generated based on NER_RE_Test.txt. The end-to-end relation set contains random generalized entity-score relations that can be used for predicted relation validation.

Data example















Corpus dictionary keys are defined as follows:

  • HADM_ID: 6-digit patient ID from MIMIC-III database, distinct for each discharge summary, can also be used for further data extraction in the system.
  • Token: the free-text was separate into a list of word as token.
  • Tags: Begin-Inside-Outside (BIO) tags were labeled according to token.
  • Relations: relation between entities with relation sequence number and relation type.
  • Entities: annotator recognized NIHSS entities, with entity sequence number and position in token list.
  • Code: labels entity according to token list, entities were labeled with entity sequence number.

Descriptive statistics of the corpus

The table below displays the number of cases for different features within the dataset.



Corpus overview


Stroke cases




Average sentences per case


Average tokens per case


Scale item




1a. LOC


1b. LOC Questions


1c. LOC Commands


2. Best Gaze


3. Visual


4. Facial Palsy


5. Motor Arm


5a. Left Arm


5b. Right Arm


6. Motor Leg


6a. Left Leg


6b. Right Leg


7. Limb Ataxia


8. Sensory


9. Best Language


10. Dysarthria


11. Extinction and Inattention








Has value


Usage Notes

NIHSS can have great importance in terms of patient health, particularly for diseases such as stroke. To develop algorithms (for example, algorithms for outcome prediction) that make use of these scores, it is necessary to develop approaches for extracting relevant information from free text notes. We share our annotated data, hoping to inspire and help researchers who are interested in identifying scale scores (NIHSS or other kinds) in unstructured EHR data. 

This corpus was generated as part of a project that explored quantified evidence for stroke. Our code applied to this corpus can be found on GitHub [10]. We encourage you to transfer this work to related NLP tasks.


This research is supported by the Beijing Natural Science Foundation (Grant No. Z200016), Chinese Academy of Medical Sciences (Grant No. 2017PT63010, 2018PT33024, 2018-I2M-AI-016). We appreciate the efforts of the MIT Laboratory for Computational Physiology and collaborating research groups for sharing MIMIC-III with the research community. We would also like to thank clinicians from the China National Clinical Research Center for Neurological Disease, Beijing Tiantan Hospital, Capital Medical University for their clinical support.

Conflicts of Interest

The authors have no conflicts of interest to report


  1. Yuan, H., An, J., Zhang, Q., Zhang, X., Sun, M., Fan, T., Cheng, Y., Wei, M., Tse, G., Waintraub, X., Li, Y., Day, J. D., Gao, F., Luo, G., & Li, G. (2020). Rates and Anticoagulation Treatment of Known Atrial Fibrillation in Patients with Acute Ischemic Stroke: A Real-World Study. Advances in therapy, 37(10), 4370–4380.
  2. Betts, K. A., Hurley, D., Song, J., Sajeev, G., Guo, J., Du, E. X., Paschoalin, M., & Wu, E. Q. (2017). Real-World Outcomes of Acute Ischemic Stroke Treatment with Intravenous Recombinant Tissue Plasminogen Activator. Journal of stroke and cerebrovascular diseases : the official journal of National Stroke Association, 26(9), 1996–2003.
  3. Abzhandadze, Tamar & Reinholdsson, Malin & Sunnerhagen, Katharina. (2020). NIHSS is not enough for cognitive screening in acute stroke: A cross-sectional, retrospective study. Scientific Reports. 10. 10.1038/s41598-019-57316-8.
  4. Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3, 160035.
  5. Woodfield, R., Grant, I., UK Biobank Stroke Outcomes Group, UK Biobank Follow-Up and Outcomes Working Group, & Sudlow, C. L. (2015). Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group. PloS one, 10(10), e0140533.
  6. Mitchell, J & Collen, Jacob & Petteys, S & Holley, Aaron. (2011). A simple reminder system improves venous thromboembolism prophylaxis rates and reduces thrombotic events for hospitalized patients. Journal of thrombosis and haemostasis : JTH. 10. 236-43. 10.1111/j.1538-7836.2011.04599.x.
  7. Stenetorp, P & Pyysalo, Sampo & Topic, Goran & Ohta, Tomoko & Ananiadou, Sophia & Tsujii, Jun'ichi. (2012). brat: a Web-based Tool for NLP-Assisted Text Annotation. The 3th Conference of the European Chapter of the Association for Computational Linguistics; Avignon, France. 102-107.
  8. Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1), 37-46. doi:10.1177/001316446002000104
  9. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
  10. Code for extracting NIHSS scores from MIMIC-III. GitHub. [Accessed: 19 January 2021]

Parent Projects
National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database was derived from: Please cite them when using this project.

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.