Resources


Database Credentialed Access

Immunosuppressive Condition and Medication Annotations for Admission Notes in the MIMIC-III Database

Vijeeth Guggilla, Melissa Bak, Mengjia Kang, Theresa Walunas, Catherine A Gao

This database contains 200 MIMIC-III admission notes with adjudicated labels for histories of various immunosuppressive conditions and usage of various immunosuppressive medications.

Published: Aug. 4, 2025. Version: 1.0.0


Database Restricted Access

DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

Ke Wang, Jiamu Yang, Ayush Shetty, Jessilyn Dunn

We present high resolution wearable device multichannel data along with clinical labeled and recorded sleep stage and polysomnography (PSG) data from 100 sleep abnormal patients with sleep apnea.

wearable sleep disorders biomedical time series classification

Published: April 30, 2025. Version: 2.1.0


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, Yeasul Kim, Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel badih El Ariss, Marc Ghanem, David Seong, Andrew Lee, Caitlin Coombes, Brad Bradshaw, Mahir Sufian, Hyo Jung Hong, Teresa Nguyen, Mohammad Rasouli, Komal Kamra, Mark Burbridge, James McAvoy, Roya Saffary, Stephen Parnell Ma, Dev Dash, James Xie, Ellen Wang, Cliff Schmiesing, Nigam Shah, Nima Aghaeepour

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence natural language processing clinical notes electronic health records large language models brief hospital course long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0


Database Credentialed Access

SCRIPT X2B8 Dataset: per-day clinical features to model successful next-day extubation

Sam Fenske, Alec Peltekian, Mengjia Kang, Nikolay Markov, Anna Pawlowski, Luke Rasmussen, Thomas Stoeger, Benjamin Singer, GR Scott Budinger, Richard Wunderink, Alexander Misharin, Ankit Agrawal, Catherine A Gao

This dataset contains electronic health record (EHR) data from ICU patients receiving mechanical ventilation, aggregated on a daily basis, along with annotations of intubation, extubation, tracheostomy days, and cases of failed extubation. Data can b

Published: Jan. 28, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports.

computer vision chest x-rays natural language processing radiology mimic machine learning

Published: July 23, 2024. Version: 2.1.0


Database Credentialed Access

MIMIC-CXR-JPG - chest radiographs with structured labels

Alistair Johnson, Matthew Lungren, Yifan Peng, Zhiyong Lu, Roger Mark, Seth Berkowitz, Steven Horng

Chest x-rays in JPG format with structured labels derived from the associated radiology report.

computer vision chest x-ray radiology mimic deep learning

Published: March 12, 2024. Version: 2.1.0


Database Credentialed Access

ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection

Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee Sung, Joel Reisman, Wenjun Li, Robert Kerns, William Becker, Hong Yu

Opioid-related aberrant behaviors (ORABs) detection Dataset (ODD) which is a large-size, expert-annotated, and multi-label classification benchmark dataset corresponding to the task

substance use natural language processing opioid related aberrant behavior

Published: Jan. 11, 2024. Version: 1.0.0


Database Credentialed Access

BOLD, a blood-gas and oximetry linked dataset

João Matos, Tristan Struja, Jack Gallifant, Luis Filipe Nakayama, Marie Charpignon, Xiaoli Liu, Jaime dos Santos Cardoso, Leo Anthony Celi, An Kwok Wong

An open-source pulse oximetry and arterial blood gas dataset, derived from MIMIC-III, MIMIC-IV, and eICU-CRD

pulse oximetry intensive care unit health equity electronic health records

Published: Nov. 8, 2023. Version: 1.0


Database Credentialed Access

GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization

Xuhai Xu, Han Zhang, Yasaman Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin Kuehn, Mike Merrill, Paula Nurius, Shwetak Patel, Tim Althoff, Margaret Morris, Eve Riskin, Jennifer Mankoff, Anind Dey

GLOBEM datasets contain the first released multi-year mobile and wearable sensing datasets from 2018 to 2021, containing 705 person-years and 497 unique participants.

health ubiquitous computing well-being passive mobile sensing human behavior modeling

Published: March 14, 2023. Version: 1.1


Database Credentialed Access

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

Melissa Poulsen, Vanessa Troiani, Philip Freda, Danielle Mowery, Anahita Davoudi

The database contains a corpus of annotated data from the MIMIC-III Critical Care Database from a study that aimed to develop and apply an annotation schema to characterize opioid use disorder and related contextual factors.

opioid use disorder substance use natural language processing clinical notes

Published: Feb. 8, 2023. Version: 1.0.0