Resources


Model Credentialed Access

Transformer models trained on MIMIC-III to generate synthetic patient notes

Ali Amin-Nejad, Julia Ive, Sumithra Velupillai

Machine learning models that have been trained using MIMIC-III to enable the creation of synthetic discharge summaries.

Published: May 27, 2020. Version: 1.0.0


Model Credentialed Access

Fine-tuning foundational models to code diagnoses from veterinary health records

Adam Kiehl, Nadia Saklou, G Joseph Strecker, et al.

Fine-tuned GatorTron LLM for veterinary diagnosis coding to 7,739 SNOMED-CT codes based on clinical summary text from the Colorado State University Veterinary Teaching Hospital.

transformers natural language processing large language models foundational models one health diagnoses snomed-ct veterinary medicine omop cdm veterinary medical records clinical coding

Published: Jan. 25, 2026. Version: 1.0.0


Database Open Access

MIMIC-IV demo data in the OMOP Common Data Model

Michael Kallfelz, Anna Tsvetkova, Tom Pollard, et al.

Preliminary work to transform a MIMIC-IV demo dataset to the OMOP Common Data Model

omop common data model

Published: June 21, 2021. Version: 0.9


Database Credentialed Access

RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports

Sarvesh Soni, Kirk Roberts

RadQA is an electronic health record question answering dataset containing clinical questions that can be answered using the Findings and Impressions sections of radiology reports

machine reading comprehension radiology reports question answering clinical notes electronic health records

Published: Dec. 9, 2022. Version: 1.0.0


Model Credentialed Access

EntityBERT: BERT-based Models Pretrained on MIMIC-III with or without Entity-centric Masking Strategy for the Clinical Domain

Chen Lin, Steven Bethard, Guergana Savova, et al.

Pretraining of models with a broad representation of biomedical terminology (PubMedBERT) on MIMIC-III corpus along with or without a novel entity-centric masking strategy.

Published: March 17, 2022. Version: 1.0.1


Database Open Access

MIMIC-IV demo data in the OMOP Common Data Model

Michael Kallfelz, Anna Tsvetkova, Tom Pollard, et al.

Preliminary work to transform a MIMIC-IV demo dataset to the OMOP Common Data Model

omop common data model

Published: June 21, 2021. Version: 0.9


Database Credentialed Access

MIMIC-IV-ECHO-Ext-LVVOLUMES-A4C-ROI: Annotated Subset of Apical Four-Chamber Echocardiography for PoCUS-Style LV Volume and Function Analysis

Kamlin Ekambaram, Anurag Arnab, Philip Herbst, et al.

A curated subset of MIMIC-IV-ECHO providing apical four-chamber cine loops with manual ROI masks, volumetric labels, and ready-to-use MP4/NPZ derivatives for robust LV volume and ejection fraction research.

ultrasound deep learning echocardiography medical imaging dicom lvesv roi segmentation cardiac video analysis left ventricular volume mimic-iv-echo apical four-chamber quantitative cardiology biplane simpson transformer models lvef ejection fraction a4c pocus lvedv domain adaptation

Published: Feb. 26, 2026. Version: 1.0.0


Software Open Access

Transformer-DeID: Deidentification of free-text clinical notes with transformers

Callandra Moore, Lucas Bulgarelli, Tom Pollard, et al.

Fine tune transformer-based neural networks to deidentify clinical text data.

deidentification neural networks transformers

Published: Nov. 2, 2023. Version: 1.0.0


Software Open Access

Transformer-DeID: Deidentification of free-text clinical notes with transformers

Callandra Moore, Lucas Bulgarelli, Tom Pollard, et al.

Fine tune transformer-based neural networks to deidentify clinical text data.

deidentification neural networks transformers

Published: Nov. 2, 2023. Version: 1.0.0


Database Credentialed Access

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Asad Aali, Vasiliki Bikia, Maya Varma, et al.

MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades.

Published: Nov. 14, 2025. Version: 1.0.1