Resources


Model Credentialed Access

Transformer models trained on MIMIC-III to generate synthetic patient notes

Ali Amin-Nejad, Julia Ive, Sumithra Velupillai

Machine learning models that have been trained using MIMIC-III to enable the creation of synthetic discharge summaries.

Published: May 27, 2020. Version: 1.0.0


Database Contributor Review

BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Henrique Dias, Ana Helena Dias Pereira dos Ulbrich

Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.

prescriptions exams tertiary care clinical notes natural language processing

Published: July 14, 2022. Version: 1.1


Database Credentialed Access

Annotated MIMIC-IV discharge summaries for a study on deidentification of names

Shulammite Lim, Yuxin Xiao, Alistair Johnson, Dana Moukheiber, Lama Moukheiber, Mira Moukheiber, Marzyeh Ghassemi, Tom Pollard

Annotated MIMIC-IV discharge summaries used to explore deidentification of names

deidentification fairness

Published: July 5, 2023. Version: 1.0


Database Credentialed Access

EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM)

Gloria Hyunjung Kwak, Dana Moukheiber, Mira Moukheiber, Lama Moukheiber, Sulaiman Moukheiber, Neel Butala, Leo Anthony Celi, Christina Chen

A structured echocardiogram database derived from 43,472 observational notes obtained during echocardiogram studies conducted in the intensive care unit at the Beth Israel Deaconess Medical Center between 2001 and 2012.

Published: Feb. 23, 2024. Version: 1.0.0


Database Open Access

MIMIC-IV Clinical Database Demo

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark

An openly available subset of patients in the MIMIC-IV database.

critical care electronic health record mimic

Published: Jan. 31, 2023. Version: 2.2


Challenge Credentialed Access

ShAReCLEF eHealth 2013: Natural Language Processing and Information Retrieval for Clinical Care

Danielle Mowery

2013 ShARe/CLEF eHealth Evaluation Lab: Natural Language Processing and Information Retrieval for Clinical Care (Tasks 1 and 2).

natural language processing

Published: Feb. 15, 2013. Version: 1.0


Database Restricted Access

KURIAS-ECG: a 12-lead electrocardiogram database with standardized diagnosis ontology

Hakje Yoo, Yunjin Yum, Soowan Park, Jeong Moon Lee, Moonjoung Jang, Yoojoong Kim, Jong-Ho Kim, Hyun-Joon Park, Kap Su Han, Jae Hyoung Park, Hyung Joon Joo

The KURIAS-ECG database is a high-quality 12-lead ECG DB including standard vocabulary (SNOMED CT, OMOP-CDM), and ECG diagnoses of our DB are grouped into 10 diagnoses by applying the minnesota code.

snomed 12-lead minnesota ecg

Published: Nov. 8, 2021. Version: 1.0


Database Credentialed Access

Deidentified Medical Text

Margaret Douglass, Bill Long, George Moody, Peter Szolovits, Li-wei Lehman, Roger Mark, Gari D. Clifford

Gold standard corpus of 2,434 deidentified nursing notes

medical text nursing notes hipaa de-identification

Published: Dec. 18, 2007. Version: 1.0


Challenge Credentialed Access

BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization

Yanjun Gao, Dmitriy Dligach, Timothy Miller, Majid Afshar

This is the data storage for BioNLP Workshop Shared Task 1A: Problem List Summarization.

bionlp clinical natural language processing electronic health record summarization

Published: Nov. 12, 2023. Version: 2.0.0


Model Credentialed Access

Clinical BERT Models Trained on Pseudo Re-identified MIMIC-III Notes

Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, Byron Wallace

We explore recovering sensitive info from BERT trained over non-deidentified EHR. We make our models and data available to further facilitate research.

Published: April 28, 2021. Version: 1.0.0