Name: Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset
Published: July 28, 2022
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Eric Lehman

Published: July 28, 2022. Version: 1.0

When using this resource, please cite: (show more options)
Lehman, E. (2022). Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/7v8e-h745.

MLA	Lehman, Eric. "Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset" (version 1.0). PhysioNet (2022), https://doi.org/10.13026/7v8e-h745.
APA	Lehman, E. (2022). Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/7v8e-h745.
Chicago	Lehman, Eric. "Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset" (version 1.0). PhysioNet (2022). https://doi.org/10.13026/7v8e-h745.
Harvard	Lehman, E. (2022) 'Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset' (version 1.0), PhysioNet. Available at: https://doi.org/10.13026/7v8e-h745.
Vancouver	Lehman E. Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset (version 1.0). PhysioNet. 2022. Available from: https://doi.org/10.13026/7v8e-h745.

Additionally, please cite the original publication:

Eric Lehman, Vladislav Lialin, Katelyn Edelwina Legaspi, Anne Janelle Sy, Patricia Therese Pile, Nicole Rose Alberto, Richard Raymund Ragasa, Corinna Victoria Puyat, Marianne Katharina Taliño, Isabelle Rose Alberto, Pia Gabrielle Alfonso, Dana Moukheiber, Byron Wallace, Anna Rumshisky, Jennifer Liang, Preethi Raghavan, Leo Anthony Celi, and Peter Szolovits. 2022. Learning to Ask Like a Physician. In Proceedings of the 4th Clinical Natural Language Processing Workshop, pages 74–86, Seattle, WA. Association for Computational Linguistics.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and consequently fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are generated by medical experts from 100+ MIMIC-III, version 1.4, discharge summaries. These discharge summaries overlap with the n2c2 challenge, so they are filled in with surrogate PHI. We analyze this dataset to characterize the types of information sought by medical experts. We also train baseline models for trigger detection and question generation (QG), paired with unsupervised answer retrieval over EHRs. Our baseline model is able to generate high quality questions in over 62% of cases when prompted with human selected triggers. We release this dataset (and a link to all code to reproduce baseline model results) to facilitate further research into realistic clinical QA and QG.

Background

Physicians often query electronic health records (EHR) to make fully informed decisions about patient care [1]. Natural language technologies such as automatic question answering (QA) may partially address this problem. There have been several dataset collection efforts that aim to facilitate the training and evaluation of clinical QA models [2, 3, 4, 5]. However, template-based [2, 4] and other kinds of automated generation [3] methods are by nature brittle and have limited evidence of producing questions that medical professionals ask. To address this paucity of natural, clinically relevant questions, we collect queries that might plausibly be asked by healthcare providers during patient handoff (i.e., transitions of care). We use patient discharge summaries from the Medical Information Mart for Intensive Care III (MIMIC-III) English dataset, version 1.4 [6] to mimic the handoff process. We expect this process to produce more natural questions than prior work.

Methods

The goal of our question collection is to gather questions that may be asked by healthcare providers during patient handoff (i.e., transitions of care). We use the patient discharge summary to simulate the handoff process, where the discharge summary is the communication from the previous physician regarding the patient’s care, treatment and current status. We specifically use discharge summaries that overlap between n2c2 [9] and MIMIC-III version 1.4 [6]. However, to ensure easy reading, we use the n2c2 versions, as these have slot-filled PHI's (with surrogates).

Annotators are asked to review the discharge summary as the receiving physician and ask any questions they may have as the physician taking over the care of this patient. Annotators are instructed to read the discharge summary line-by-line and record (1) any questions that may be important with respect to the patient’s future care, and, (2) the text within the note that triggered the question. This may mean that questions asked early on may be answered later in the discharge summary. Annotators are permitted to go back and ask questions if they feel the need to do so.

To capture the annotators’ natural thought processes, we purposely provide only minimal guidance to annotators on how to select a trigger or what type of questions to ask. We only ask that annotators use the minimum span of text when specifying a trigger. We also encourage all questions to be asked in whatever format they feel appropriate. This leads to many informal queries, in which questions are incomplete or grammatically incorrect. Further, we encourage all types of questions to be asked, regardless of whether they could be answered based on the EHR. We also allow the annotators to ask an arbitrary number of questions. This allows for annotators to skip discharge summaries entirely should they not have any questions.

Data Description

We release 3 files: discq_questions_final.csv, human_eval_q.csv, i2b2_to_mimic_map.csv.

discq_questions_final.csv

This file contains all of the questions asked by our medical experts, as well as the trigger that prompted their question. We describe the file contents below:

id: The text file that the question, and trigger are asked on (n2c2) [9]. To get access to these files, download them here: [9]
name: The annotator who wrote the question.
question: The question asked.
reasoning: The extractive piece of text that triggered the question asked.
start_index: The character start index of the "reasoning" or "trigger".
end_index: The character end index of the "reasoning" or "trigger".

human_eval_q.csv:

This file contains all of the human evaluation judgments made about the quality of questions asked.

id: Which text file this question is from (n2c2) [9]. This file can be accessed by downloading the files from n2c2: [9]
model: The model that asked the question. Gold refers to real physician questions.
trigger: The trigger for which this question is asked from.
ch_trigger_start: The character index start for the trigger.
ch_trigger_end: The character index end for the trigger.
question: The question for annotation.
answer: The suggested answer for annotation. The answers are either from ANY MIMIC-III note for the given patient OR from the n2c2 [9] discharge summary.
done: Has annotation been completed for this question + trigger?
understandable: Is the question (in this row) understandable, as defined by [8].
nontrivial: Is the question (in this row) nontrivial with respect to the sentence containing the trigger, as defined by [8].
relevant_to_trigger: Is the question (in this row) nontrivial with respect to the trigger, as defined by [8].
medically_significant: Is the question (in this row) medically significant, as defined by [8].
sufficient_answer: Is the answer suggested in the answer field "Fully", "Partial"[ly], or "No"[t] sufficient. Fully means that the answer suggested (in the same row) answers the question. Partial means that the answer suggested is related to or partially answers the question. Lastly, "No" means that the answer suggested is completely unrelated to the question.
user: Which user annotated this data.

i2b2_to_mimic_map.csv:

This file contains a mapping of i2b2 file names (given in reformatted_txt_files.zip) to MIMIC ROW_IDs and SUBJECT_IDS:

file: This field specifies which n2c2 file links to which discharge summary in MIMIC. Split by the "/" and use the last element to see what file is being discussed. Some of these are available in the reformatted_txt_files.zip.
row_num: The row number of the discharge summary.
subject_id: The subject id of the patient whose discharge summary is discussed.
hadm_id: The hospital admission id of the patient whose discharge summary is discussed.
chartdate: The chartdate of the discharge summary discussed.
source: The n2c2 source of the discharge summary discussed (i.e., which category).

Usage Notes

This data can be used for realistic question generation. We highly recommend using Python to work with this data. To simply view the data, Excel/Google Sheets is sufficient for viewing the .csv files. Please also download the n2c2 data here: [9]. They are necessary for using the offsets provided. Examples of how to work with the data will also be available on our Github [7]

Ethics

We are releasing our data under MIMIC-III credentialed access. [10] warns against training large-scale transformer models (particularly ones for generation) on sensitive data. Although MIMICIII notes consist of deidentified data, we will not release our model weights to the general public. With respect to the trigger detection system, there is less risk in releasing the model weights, as BERT has not been pretrained with generation tasks [11]. We caution all follow up work to take these privacy concerns into account.

Conflicts of Interest

This research was supported by Oracle and MIT-IBM Watson AI lab.

References

Dina Demner-Fushman, Wendy Chapman, and Clement Mcdonald. 2009. What can natural language processing do for clinical decision support? Journal of biomedical informatics, 42:760–72.
Anusri Pampari, Preethi Raghavan, Jennifer Liang, and Jian Peng. 2018. emrQA: A Large Corpus for Question Answering on Electronic Medical Records. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2357–2368, Brussels, Belgium. Association for Computational Linguistics.
X. Yue, X. F. Zhang, Z. Yao, S. Lin and H. Sun, "CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering," 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021, pp. 580-587, doi: 10.1109/BIBM52615.2021.9669300.
Preethi Raghavan, Jennifer J Liang, Diwakar Mahajan, Rachita Chandra, and Peter Szolovits. 2021. emrKBQA: A clinical knowledge-base question answering dataset. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 64–73, Online. Association for Computational Linguistics.
Gregory Kell, Iain Marshall, Byron Wallace, and Andre Jaun. 2021. What would it take to get biomedical QA systems into practice? In Proceedings of the 3rd Workshop on Machine Reading for Question Answering, pages 28–41, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. Mimiciii, a freely accessible critical care database. Scientific data, 3:160035.
Github for Learning to Ask Like a Physician. https://github.com/elehman16/discq
Eric Lehman, Vladislav Lialin, Katelyn Edelwina Legaspi, Anne Janelle Sy, Patricia Therese Pile, Nicole Rose Alberto, Richard Raymund Ragasa, Corinna Victoria Puyat, Marianne Katharina Taliño, Isabelle Rose Alberto, Pia Gabrielle Alfonso, Dana Moukheiber, Byron Wallace, Anna Rumshisky, Jennifer Liang, Preethi Raghavan, Leo Anthony Celi, and Peter Szolovits. 2022. Learning to Ask Like a Physician. In Proceedings of the 4th Clinical Natural Language Processing Workshop, pages 74–86, Seattle, WA. Association for Computational Linguistics.
n2c2 Link. https://n2c2.dbmi.hms.harvard.edu/
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Xiaodong Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting training data from large language models. In USENIX Security Symposium.
Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, and Byron Wallace. 2021. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 946–959, Online. Association for Computational Linguistics.