Resources


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, Joanne Teh, Leon Worth, Monica Slavin, karin thursky, Karin Verspoor

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp information extraction clinical documentation invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Contributor Review

BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Henrique Dias, Ana Helena Dias Pereira dos Ulbrich

Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.

prescriptions exams tertiary care natural language processing clinical notes

Published: July 14, 2022. Version: 1.1


Database Credentialed Access

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Chaitanya Shivade

This is a resource for training machine learning models for language inference in the medical domain.

natural language inference recognizing textual entailment

Published: Oct. 1, 2019. Version: 1.0.0


Database Credentialed Access

Nosocomial Risk Datasets from MIMIC-III

Travis Goodwin

Text-based Longitudinal Data for Predicting Nosocomial Disease Risk as used by CANTRIP.

deep learning natural language processing pressure injury risk prediction acute kidney injury anemia forecasting

Published: Sept. 15, 2022. Version: 1.0


Database Credentialed Access

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Jayetri Bardhan, Anthony Colas, Kirk Roberts, Daisy Zhe Wang

DrugEHRQA is a QA dataset containing question-answers from MIMIC-III tables and discharge summaries.

question-answer qa

Published: April 12, 2022. Version: 1.0.0


Database Credentialed Access

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale, Jordan Swartz, T Greg McKelvey, Yi Yang, David Sontag

Clinical action items annotated over MIMIC-III. 718 discharge summaries are labeled at a sentence- and character-level with multiple action labels including Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other.

Published: June 21, 2021. Version: 1.0.0


Model Credentialed Access

Clinical-T5: Large Language Models Built Using MIMIC Clinical Text

Eric Lehman, Alistair Johnson

We train a T5-Base and T5-Large from scratch on MIMIC-III and MIMIC-IV. Additionally, we further pretrain T5-Base and SciFive on notes from MIMIC. We release these model weights on PhysioNet.

Published: Jan. 25, 2023. Version: 1.0.0


Model Credentialed Access

Transformer models trained on MIMIC-III to generate synthetic patient notes

Ali Amin-Nejad, Julia Ive, Sumithra Velupillai

Machine learning models that have been trained using MIMIC-III to enable the creation of synthetic discharge summaries.

Published: May 27, 2020. Version: 1.0.0


Database Contributor Review

BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Henrique Dias, Ana Helena Dias Pereira dos Ulbrich

Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.

prescriptions exams tertiary care natural language processing clinical notes

Published: July 14, 2022. Version: 1.1


Database Credentialed Access

EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings

Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwang Hyun Kim, Seunghyun Won, Edward Choi

A patient-specific question answering benchmark tailored for evaluating Large Language Models (LLMs) in clinical environments

Published: April 3, 2024. Version: 1.0.0