Resources
Challenge Credentialed Access
ShAReCLEF eHealth Evaluation Lab 2014 (Task 2): Disorder Attributes in Clinical Reports
Published: Nov. 1, 2013. Version: 1.0
Database Credentialed Access
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline
Published: Oct. 23, 2025. Version: 1.0.1
Database Credentialed Access
MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context
Published: Dec. 10, 2025. Version: 1.0.1
Database Credentialed Access
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline
Published: Oct. 23, 2025. Version: 1.0.1
Database Open Access
Radiology Report Generation Models Evaluation Dataset For Chest X-rays (RadEvalX)
Published: June 18, 2024. Version: 1.0.0
Database Credentialed Access
Radiology Report Expert Evaluation (ReXVal) Dataset
Published: June 20, 2023. Version: 1.0.0
Database Open Access
Integration of Electroencephalogram and Eye-Gaze Datasets for Performance Evaluation in Fundamentals of Laparoscopic Surgery (FLS) Tasks
Published: Aug. 23, 2023. Version: 1.0.0
Visualize waveformsDatabase Open Access
Electroencephalogram and eye-gaze datasets for robot-assisted surgery performance evaluation
Published: July 14, 2023. Version: 1.0.0
Visualize waveformsDatabase Credentialed Access
MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation
artificial intelligence natural language processing clinical notes electronic health records large language models brief hospital course long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation
Published: April 9, 2025. Version: 1.0.0
Database Credentialed Access
MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation
artificial intelligence natural language processing clinical notes electronic health records large language models brief hospital course long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation
Published: April 9, 2025. Version: 1.0.0