Resources


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv large language models clinical question-answering medical discharge summaries

Published: Jan. 11, 2024. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

mimic clinical natural language processing large language models domain transfer bias stigmatizing language

Published: Nov. 6, 2023. Version: 1.0.0


Database Open Access

BIG IDEAs Lab Glycemic Variability and Wearable Device Data

Peter Cho, Juseong Kim, Brinnae Bent, Jessilyn Dunn

Glucose measurements and wrist-worn wearable sensor data from highnormoglycemic participants.

biomedical engineering pre-diabetes biomarkers

Published: Sept. 18, 2023. Version: 1.1.2


Database Open Access

MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset

Brian Gow, Tom Pollard, Larry A Nathanson, Alistair Johnson, Benjamin Moody, Chrystinne Fernandes, Nathaniel Greenbaum, Jonathan W Waks, Parastou Eslami, Tanner Carbonati, Ashish Chaudhari, Elizabeth Herbst, Dana Moukheiber, Seth Berkowitz, Roger Mark, Steven Horng

The MIMIC-IV ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These patients overlap with the patients from the MIMIC-IV Clinical Database.

Published: Sept. 15, 2023. Version: 1.0

Visualize waveforms

Database Credentialed Access

MIMIC-IV-ECHO: Echocardiogram Matched Subset

Brian Gow, Tom Pollard, Nathaniel Greenbaum, Benjamin Moody, Alistair Johnson, Elizabeth Herbst, Jonathan W Waks, Parastou Eslami, Ashish Chaudhari, Tanner Carbonati, Seth Berkowitz, Roger Mark, Steven Horng

The MIMIC-IV-ECHO module contains more than 500,000 echocardiograms across more than 4,500 unique patients. These patients overlap with the patients from the MIMIC-IV Clinical Database.

Published: July 21, 2023. Version: 0.1


Database Credentialed Access

Annotated MIMIC-IV discharge summaries for a study on deidentification of names

Shulammite Lim, Yuxin Xiao, Alistair Johnson, Dana Moukheiber, Lama Moukheiber, Mira Moukheiber, Marzyeh Ghassemi, Tom Pollard

Annotated MIMIC-IV discharge summaries used to explore deidentification of names

deidentification fairness

Published: July 5, 2023. Version: 1.0


Database Credentialed Access

Radiology Report Expert Evaluation (ReXVal) Dataset

Feiyang Yu, Mark Endo, Rayan Krishnan, Ian Pan, Andy Tsai, Eduardo Pontes Reis, Eduardo Kaiser Ururahy Nunes Fonseca, Henrique Lee, Zahra Shakeri, Andrew Ng, Curtis Langlotz, Vasantha Kumar Venugopal, Pranav Rajpurkar

The Radiology Report Expert Evaluation (ReXVal) Dataset is a publicly available dataset of radiologist evaluations of errors in automatically generated radiology reports.

Published: June 20, 2023. Version: 1.0.0


Database Credentialed Access

Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center

Senjun Jin, Lin Chen, Kun Chen, Zhongheng Zhang

Chinese critical care database from electronic healthcare records in a tertiary care medical center

critical care database china

Published: Jan. 19, 2023. Version: 1.0


Database Credentialed Access

RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports

Sarvesh Soni, Kirk Roberts

RadQA is an electronic health record question answering dataset containing clinical questions that can be answered using the Findings and Impressions sections of radiology reports

electronic health records clinical notes question answering radiology reports machine reading comprehension

Published: Dec. 9, 2022. Version: 1.0.0


Database Credentialed Access

Nosocomial Risk Datasets from MIMIC-III

Travis Goodwin

Text-based Longitudinal Data for Predicting Nosocomial Disease Risk as used by CANTRIP.

deep learning natural language processing pressure injury risk prediction acute kidney injury anemia forecasting

Published: Sept. 15, 2022. Version: 1.0