Challenge Contributor Review

BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization

Yanjun Gao Dmitriy Dligach Timothy Miller Majid Afshar

Published: April 13, 2023. Version: 1.1.0 <View latest version>


When using this resource, please cite: (show more options)
Gao, Y., Dligach, D., Miller, T., & Afshar, M. (2023). BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization (version 1.1.0). PhysioNet. https://doi.org/10.13026/1crd-qk14.

Additionally, please cite the original publication:

Gao, Y., Dligach, D., Miller, T., Xu, D., Churpek, M. M., & Afshar, M. (2022, October). Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 2979-2991).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Automatically summarizing patients’ main problems from the daily care notes in the electronic health record can help mitigate information and cognitive overload for clinicians and provide augmented intelligence via computerized diagnostic decision support at the bedside. The task of Problem List Summarization aims to generate a list of diagnoses and problems in a patient’s daily care plan using input from the provider’s progress notes during hospitalization.

This task aims to promote NLP model development for downstream applications in diagnostic decision support systems that could improve efficiency and reduce diagnostic errors in hospital care. The task contains 768 hospital daily progress notes and 2783 diagnoses in the training set, and a new set of 237 daily progress notes are recently annotated as the test set. The annotation methods and annotation quality have previously been reported.

The dataset supports a more complex summarization task to generate a list of relevant diagnoses/problems given the information in the Subjective, Objective, and Assessment sections of the note. Only diagnoses/problems that are available in the progress note were labeled for the task.


Objective

This shared task aims to attract future research efforts in building NLP models for real-world decision support applications, where a system generating relevant and accurate diagnoses will assist the healthcare providers’ decision-making process and improve the quality of care for patients. 

The task and baseline methods are described in our COLING paper on Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models [2]. The annotation methods and annotation quality have previously been reported [3].


Participation

To access the Challenge dataset, participants should first register for the shared task through the BioNLP Workshop 2023 website [4].

Important Dates for BioNLP Workshop Shared Task 1A

  • Registration opens: January 13th, 2023
  • Releasing of training and validation data: January 13th, 2023
  • Releasing of test data: April 13th, 2023
  • System submission deadline: April 20th, 2023
  • System papers due date: April 28th, 2023
  • Notification of acceptance: June 1st, 2023
  • Camera-ready system papers due: June 6th, 2023
  • BioNLP Workshop Date: July 13th or 14th, 2023

Data Description

All progress notes were sourced from MIMIC-III, a publicly available dataset of de-identified EHR data from approximately 60,000 hospital ICU admissions at Beth Israel Deaconess Medical Center in Boston, Massachusetts [5, 6]. The goal of the annotation was to label lists of relevant problems/diagnoses from the Plan subsections. For each Plan subsection, the annotators marked the text span for the Problem, separating the diagnosis/problems from the treatment or action plans.

The progress note types from MIMIC-III included a total of 84 note types (DESCRIPTION header) including the following: Physician Resident Note, Intensivist Note (SICU, CVICU, TICU), PGY1 Progress Note, PGY1/Attending Daily Progress Note MICU, MICU Resident/Attending Daily Progress Note. Other note types were excluded such as Nursing Progress Note and SocialWorker Progress Note because these are not commonly structured in the SOAP format.

Under this project, you will see a CSV file containing 765 progress notes (with 3 duplicates removed). The CSV file has five columns: FILE ID, Subjective Sections, Objective Sections, Assessment and Summary (Ground Truth).

We also provide an example dataloader script that we used for baseline experiments. For more detail, see our associated paper on Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models [2]. In the script, we set input options to be Assessment (A Only), S+A (Subjective and Assessment), and All (Subjective, Objective and Assessment).


Evaluation

We use ROUGE-L as the main evaluation metric [7, 8]. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics and used for evaluating automatic summarization in NLP tasks. The metrics can be used to compare a computationally generated summary against a reference summary.

ROUGE-L is a metric that captures the Longest Common Subsequence (LCS) between a generated summary and reference summary. i.e. the metric captures the longest sequence of words shared between the generated and reference summaries.


Release Notes

Version 1.1.0: This is the initial release for the test set for BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization. In addition to the training set and example data loader script, we provide a test set of 237 newly annotated progress notes that contain input fields including FILE ID, Subjective Sections, Objective Sections, Assessment.Ground truth summaries for the test set will be released after the workshop.   

Version 1.0.0: This is the initial release for the BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization. In addition to the dataset, we provide an example script for loading the dataset. In our previous experiment with T5, we used special tokens "<Assessment>", "<Subjective>" and "<Objective>" to indicate the input sections.


Ethics

The use of the data in this research came from a fully de-identified dataset (contains no protected health information) that we received permission for use under a PhysioNet Credentialed Health Data Use Agreement (v1.5.0). The study was determined to be exempt from human subjects research. All experiments followed the PhysioNet Credentialed Health Data License Agreement. Medical charting by providers in the electronic health record is at-risk for multiple types of bias.

Our research focused on building a system to overcome the cognitive biases in medical decision-making by providers. However, statistical and social biases need to be addressed before integrating our work into any clinical decision support system for clinical trials or healthcare delivery. In particular, implicit bias towards vulnerable populations and stigmatizing language in certain medical conditions like substance use disorders are genuine concerns that can transfer into language model training. Therefore, it should be assumed that our corpus of notes for this task will carry social bias features that can affect fairness and equity during model training.

Before the deployment of any pre-trained language model, it is the responsibility of the scientists and health system to audit the model for fairness and equity in its performance across disparate health groups. Fairness and equity audits alongside model explanations are needed to ensure an ethical model trustworthy to all stakeholders, especially patients and providers.


Acknowledgements

We thank PhysioNet for their help in hosting the dataset. We thank the BioNLP workshop organizing committee for their guidance and help in setting up this shared task. We would like to acknowledge the hard work by our medical student annotators, Ryan Laffin and Samuel Tesch, who were supported by the University of Wisconsin Summer Shapiro Grant Program.


Conflicts of Interest

No competing interests are declared.


References

  1. Gao, Y., Caskey, J., Miller, T., Sharma, B., Churpek, M., Dligach, D., & Afshar, M. (2022). Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization (version 1.0.0). PhysioNet. https://doi.org/10.13026/wks0-w041.
  2. Yanjun Gao, Dmitriy Dligach, Timothy Miller, Dongfang Xu, Matthew M. M. Churpek, and Majid Afshar. 2022. Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2979–2991, Gyeongju, Republic of Korea. International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.264/
  3. Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, and Majid Afshar. 2022. Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5484–5493, Marseille, France. European Language Resources Association.
  4. Website for BIONLP 2023 and Shared Tasks @ ACL 2023. https://aclweb.org/aclwiki/BioNLP_Workshop [Accessed: 18 Jan 2023]
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26
  6. Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
  7. Lin, Chin-Yew. 2004. ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26, 2004.
  8. Lin, Chin-Yew and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, July 21 - 26, 2004.

Parent Projects
BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files. In addition, users must have individual studies reviewed by the contributor.

License (for files):
PhysioNet Contributor Review Health Data License 1.5.0

Data Use Agreement:
PhysioNet Contributor Review Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.
Versions

Files