Challenge Contributor Review
BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization
Published: Jan. 19, 2023. Version: 1.0.0
BioNLP Workshop 2023: Problem List Summarization (Jan. 19, 2023, 1:03 p.m.)
We are excited to announce the launch of a shared task on problem list summarization at the BioNLP Workshop 2023. The goal for participants is to generate a list of diagnoses and problems in a patient’s daily care plan using input from the provider’s progress notes during hospitalization. The task contains 768 progress notes for training, and 300 progress notes for evaluation. The goal of this shared task is to attract future research efforts in building NLP models for real-world decision support applications, where a system generating relevant and accurate diagnoses will assist the healthcare providers’ decision-making process and improve the quality of care for patients.
Participants will be tasked with developing NLP systems for EHR summarization. Participants who design novel systems and achieve competitive performance in the shared task, running from January to April 2023, will be invited to present their results at the BioNLP Workshop, which will be held in Toronto, Canada and co-located with ACL 2023. The challenge is open to anyone interested in clinical NLP and medical AI. We encourage individuals, teams, and organizations to participate.
To register for the challenge, please visit: https://forms.gle/geTXN6Z1pyfC55Fn8. More information about the challenge, including the official rules and guidelines, can be found at: https://physionet.org/content/bionlp-workshop-2023-task-1a/. You are welcome to join our google discussion group for newest update: https://groups.google.com/g/bionlp2023problemsumm
When using this resource, please cite:
(show more options)
Gao, Y., Miller, T., Afshar, M., & Dligach, D. (2023). BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization (version 1.0.0). PhysioNet. https://doi.org/10.13026/1z6g-ex18.
Gao, Y., Dligach, D., Miller, T., Xu, D., Churpek, M. M., & Afshar, M. (2022, October). Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 2979-2991).
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Automatically summarizing patients’ main problems from the daily care notes in the electronic health record can help mitigate information and cognitive overload for clinicians and provide augmented intelligence via computerized diagnostic decision support at the bedside. The task of Problem List Summarization aims to generate a list of diagnoses and problems in a patient’s daily care plan using input from the provider’s progress notes during hospitalization.
This task aims to promote NLP model development for downstream applications in diagnostic decision support systems that could improve efficiency and reduce diagnostic errors in hospital care. The task contains 768 hospital daily progress notes and 2783 diagnoses in the training set, and physicians will annotate a new set of 300 daily progress notes as the test set. The annotation methods and annotation quality have previously been reported.
The dataset supports a more complex summarization task to generate a list of relevant diagnoses/problems given the information in the Subjective, Objective, and Assessment sections of the note. Only diagnoses/problems that are available in the progress note were labeled for the task.
This shared task aims to attract future research efforts in building NLP models for real-world decision support applications, where a system generating relevant and accurate diagnoses will assist the healthcare providers’ decision-making process and improve the quality of care for patients.
The task and baseline methods are described in our COLING paper on Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models . The annotation methods and annotation quality have previously been reported .
To access the Challenge dataset, participants should first register for the shared task through the BioNLP Workshop 2023 website .
Important Dates for BioNLP Workshop Shared Task 1A
- Registration opens: January 13th, 2023
- Releasing of training and validation data: January 13th, 2023
- Releasing of test data: April 13th, 2023
- System submission deadline: April 20th, 2023
- System papers due date: May 4th, 2023
- Notification of acceptance: June 1st, 2023
- Camera-ready system papers due: June 13th, 2023
- BioNLP Workshop Date: July 13th or 14th, 2023
All progress notes were sourced from MIMIC-III, a publicly available dataset of de-identified EHR data from approximately 60,000 hospital ICU admissions at Beth Israel Deaconess Medical Center in Boston, Massachusetts [5, 6]. The goal of the annotation was to label lists of relevant problems/diagnoses from the Plan subsections. For each Plan subsection, the annotators marked the text span for the Problem, separating the diagnosis/problems from the treatment or action plans.
The progress note types from MIMIC-III included a total of 84 note types (DESCRIPTION header) including the following: Physician Resident Note, Intensivist Note (SICU, CVICU, TICU), PGY1 Progress Note, PGY1/Attending Daily Progress Note MICU, MICU Resident/Attending Daily Progress Note. Other note types were excluded such as Nursing Progress Note and SocialWorker Progress Note because these are not commonly structured in the SOAP format.
Under this project, you will see a CSV file containing 765 progress notes (with 3 duplicates removed). The CSV file has five columns:
Summary (Ground Truth).
We also provide an example dataloader script that we used for baseline experiments. For more detail, see our associated paper on Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models . In the script, we set input options to be Assessment (A Only), S+A (Subjective and Assessment), and All (Subjective, Objective and Assessment).
We use ROUGE-L as the main evaluation metric [7, 8]. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics and used for evaluating automatic summarization in NLP tasks. The metrics can be used to compare a computationally generated summary against a reference summary.
ROUGE-L is a metric that captures the Longest Common Subsequence (LCS) between a generated summary and reference summary. i.e. the metric captures the longest sequence of words shared between the generated and reference summaries.
Version 1.0.0: This is the initial release for the BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization. In addition to the dataset, we provide an example script for loading the dataset. In our previous experiment with T5, we used special tokens "<
Subjective>" and "<
Objective>" to indicate the input sections.
The use of the data in this research came from a fully de-identified dataset (contains no protected health information) that we received permission for use under a PhysioNet Credentialed Health Data Use Agreement (v1.5.0). The study was determined to be exempt from human subjects research. All experiments followed the PhysioNet Credentialed Health Data License Agreement. Medical charting by providers in the electronic health record is at-risk for multiple types of bias.
Our research focused on building a system to overcome the cognitive biases in medical decision-making by providers. However, statistical and social biases need to be addressed before integrating our work into any clinical decision support system for clinical trials or healthcare delivery. In particular, implicit bias towards vulnerable populations and stigmatizing language in certain medical conditions like substance use disorders are genuine concerns that can transfer into language model training. Therefore, it should be assumed that our corpus of notes for this task will carry social bias features that can affect fairness and equity during model training.
Before the deployment of any pre-trained language model, it is the responsibility of the scientists and health system to audit the model for fairness and equity in its performance across disparate health groups. Fairness and equity audits alongside model explanations are needed to ensure an ethical model trustworthy to all stakeholders, especially patients and providers.
We thank PhysioNet for their help in hosting the dataset. We thank the BioNLP workshop organizing committee for their guidance and help in setting up this shared task. We would like to acknowledge the hard work by our medical student annotators, Ryan Laffin and Samuel Tesch, who were supported by the University of Wisconsin Summer Shapiro Grant Program.
Conflicts of Interest
No competing interests are declared.
- Gao, Y., Caskey, J., Miller, T., Sharma, B., Churpek, M., Dligach, D., & Afshar, M. (2022). Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization (version 1.0.0). PhysioNet. https://doi.org/10.13026/wks0-w041.
- Yanjun Gao, Dmitriy Dligach, Timothy Miller, Dongfang Xu, Matthew M. M. Churpek, and Majid Afshar. 2022. Summarizing Patients’ Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2979–2991, Gyeongju, Republic of Korea. International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.264/
- Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, and Majid Afshar. 2022. Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5484–5493, Marseille, France. European Language Resources Association.
- Website for BIONLP 2023 and Shared Tasks @ ACL 2023. https://aclweb.org/aclwiki/BioNLP_Workshop [Accessed: 18 Jan 2023]
- Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26
- Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.
- Lin, Chin-Yew. 2004. ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26, 2004.
- Lin, Chin-Yew and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, July 21 - 26, 2004.
Only credentialed users who sign the DUA can access the files. In addition, users must have individual studies reviewed by the contributor.
License (for files):
PhysioNet Contributor Review Health Data License 1.5.0
Data Use Agreement:
PhysioNet Contributor Review Health Data Use Agreement 1.5.0
CITI Data or Specimens Only Research