Resources


Challenge Credentialed Access

ShAReCLEF eHealth Evaluation Lab 2014 (Task 2): Disorder Attributes in Clinical Reports

Danielle Mowery

The ShARe/CLEF eHealth 2014 Challenge (Task 2) on Disorder Attributes in Clinical Reports

Published: Nov. 1, 2013. Version: 1.0


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, et al.

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes machine learning brief hospital course text summarization

Published: Feb. 3, 2025. Version: 1.2.0


Database Credentialed Access

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Nidhi Soley, MaKhaila Bentil, Jash Shah, et al.

This project provides a manually annotated dataset of social determinants of health—social support, occupation, and substance use—linked to pregnancy outcomes, extracted from MIMIC-III and MIMIC-IV discharge summary notes.

Published: Aug. 4, 2025. Version: 1.0.0


Database Restricted Access

Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels

Kendall Park, Rory Sayres, Andrew Sellergren, et al.

This work further refines the labels associated with CheXpert in MIMIC-CXR-JPG 2.0.0 by filtering with Med-PaLM 2 followed by verification by manual review by three US board-certified radiologists.

mimic-cxr labels

Published: Feb. 4, 2025. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, et al.

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Database Credentialed Access

RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain

Pavel Blinov, Aleksandr Nesterov, Galina Zubkova, et al.

RuMedNLI is the full counterpart dataset of MedNLI in Russian language.

natural language inference recognizing textual entailment russian language

Published: April 1, 2022. Version: 1.0.0


Model Credentialed Access

Fine-tuning foundational models to code diagnoses from veterinary health records

Adam Kiehl, Nadia Saklou, G Joseph Strecker, et al.

Fine-tuned GatorTron LLM for veterinary diagnosis coding to 7,739 SNOMED-CT codes based on clinical summary text from the Colorado State University Veterinary Teaching Hospital.

transformers natural language processing large language models foundational models one health diagnoses snomed-ct veterinary medicine omop cdm veterinary medical records clinical coding

Published: Jan. 25, 2026. Version: 1.0.0


Database Credentialed Access

Medication Extraction Labels for MIMIC-IV-Note Clinical Database

Akshay Goel, Almog Gueta, Omry Gilon, et al.

Medication extraction NLP labels for 600 discharge summaries in MIMIC-IV-Note dataset.

Published: Dec. 12, 2023. Version: 1.0.0


Database Credentialed Access

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, et al.

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports, which are obtained using a novel information extraction (IE) schema to capture clinically relevant information in a radiology report.

entity and relation extraction graph multi-modal natural language processing radiology

Published: June 3, 2021. Version: 1.0.0


Database Credentialed Access

National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database

Jiayang Wang, Xiaoshuo Huang, Lin Yang, et al.

A dataset of annotated NIHSS scale items and corresponding scores from stroke patients discharge summaries in MIMIC-III.

Published: Jan. 25, 2021. Version: 1.0.0