Resources


Model Credentialed Access

Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research

Kenneth Kehl, Pavel Trukhanov, Christopher Fong, Justin Jee, Karl Pichotta, Morgan Paul, Chelsea Nichols, Michele Waters, Nikolaus Schultz, Deborah Schrag

The DFCI-imaging-student and DFCI-medonc-student AI models for extracting cancer outcomes from imaging reports and medical oncologist notes from electronic health records.

Published: Oct. 24, 2024. Version: 1.0.0


Database Credentialed Access

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, Olivier Elemento, Maria Powell, David Dorr, Philip Payne, Vardit Ravitsky, Jean-Christophe Bélisle-Pipon, Alistair Johnson, Ruth Bahr, Stephanie Watts, Donald Bolser, Jennifer Siu, Jordan Lerner-Ellis, Frank Rudzicz, Micah Boyer, Samantha Salvi Cruz, Yassmeen Abdel-Aty, Toufeeq Ahmed Syed, James Anibal, Stephen Aradi, Ana Sophia Martinez, Shaheen Awan, Steven Bedrick, Isaac Bevers, Rahul Brito, Selina Casalino, John Costello, Iris De Santiago, Enrique Diaz-Ocampo, Mohamed Ebraheem, Ellie Eiseman, Mahmoud Elmahdy, Emily Evangelista, Kenneth Fletcher, Alexander Gelbard, Anna Goldenberg, Karim Hanna, William Hersh, Lochana Jayachandran, Kaley Jenney, Kathy Jenkins, Stacy Jo, Ayush Kalia, Andrea Krussel, Elisa Lapadula, Chloe Loewith, Radhika Mahajan, Vrishni Maharaj, Siyu Miao, Matthew Mifsud, Marian Mikhael, Elijah Moothedan, Yosef Nafii, Tempestt Neal, Karlee Newberry, Evan Ng, Christopher Nickel, Trevor Pharr, Claire Premi-Bortolotto, JM Rahman, Sarah Rohde, Laurie Russell, Suketu Shah, Ahmed Shawkat, Elizabeth Silberholz, Duncan Sutherland, Venkata Swarna Mukhi, Jeffrey Tang, Jamie Toghranegar, Kimberly Vinson, Claire Wilson, Madeleine Zanin, Xijie Zeng, Theresa Zesiewicz, Robin Zhao, Pantelis Zisimopoulos, Satrajit Ghosh

A dataset of features from voice recordings and metadata to enable the development, benchmarking, and validation of clinically applicable machine-learning models for diagnosing a wide range of health conditions.

voice bridge2ai

Published: April 16, 2025. Version: 2.0.0


Database Restricted Access

KURIAS-ECG: a 12-lead electrocardiogram database with standardized diagnosis ontology

Hakje Yoo, Yunjin Yum, Soowan Park, Jeong Moon Lee, Moonjoung Jang, Yoojoong Kim, Jong-Ho Kim, Hyun-Joon Park, Kap Su Han, Jae Hyoung Park, Hyung Joon Joo

The KURIAS-ECG database is a high-quality 12-lead ECG DB including standard vocabulary (SNOMED CT, OMOP-CDM), and ECG diagnoses of our DB are grouped into 10 diagnoses by applying the minnesota code.

snomed minnesota 12-lead ecg

Published: Nov. 8, 2021. Version: 1.0


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, Yeasul Kim, Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel badih El Ariss, Marc Ghanem, David Seong, Andrew Lee, Caitlin Coombes, Brad Bradshaw, Mahir Sufian, Hyo Jung Hong, Teresa Nguyen, Mohammad Rasouli, Komal Kamra, Mark Burbridge, James McAvoy, Roya Saffary, Stephen Parnell Ma, Dev Dash, James Xie, Ellen Wang, Cliff Schmiesing, Nigam Shah, Nima Aghaeepour

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence clinical notes natural language processing large language models brief hospital course electronic health records long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, Yeasul Kim, Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel badih El Ariss, Marc Ghanem, David Seong, Andrew Lee, Caitlin Coombes, Brad Bradshaw, Mahir Sufian, Hyo Jung Hong, Teresa Nguyen, Mohammad Rasouli, Komal Kamra, Mark Burbridge, James McAvoy, Roya Saffary, Stephen Parnell Ma, Dev Dash, James Xie, Ellen Wang, Cliff Schmiesing, Nigam Shah, Nima Aghaeepour

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence clinical notes natural language processing large language models brief hospital course electronic health records long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0


Database Open Access

Radiology Report Generation Models Evaluation Dataset For Chest X-rays (RadEvalX)

Amos Rubin Calamida, Farhad Nooralahzadeh, Morteza Rohanian, Mizuho Nishio, Koji Fujimoto, Michael Krauthammer

The RadEvalX is a publicly available dataset developed similarly to the ReXVal dataset. RedEvalX focuses on radiologist evaluations of errors found in automatically generated radiology reports.

Published: June 18, 2024. Version: 1.0.0


Database Credentialed Access

Chest X-ray Dataset with Lung Segmentation

Wimukthi Indeewara, Mahela Hennayake, Kasun Rathnayake, Thanuja Ambegoda, Dulani Meedeniya

CXLSeg dataset: Chest X-ray with Lung Segmentation, a comparatively large dataset of segmented Chest X-ray radiographs based on the MIMIC-CXR dataset. This contains segmentation results of 243,324 frontal view images and corresponding masks.

segmentation medical reports u-net chest radiographs mimic-cxr chest x-ray

Published: Feb. 8, 2023. Version: 1.0.0


Database Credentialed Access

Chest X-ray segmentation images based on MIMIC-CXR

Li-Ching Chen, Po-Chih Kuo, Ryan Wang, Judy Gichoya, Leo Anthony Celi

A chest x-rays segmentation dataset derived from MIMIC-CXR based on deep learning algorithm and human examination.

segmentation chest x-rays cxr

Published: Aug. 18, 2022. Version: 1.0.0


Database Restricted Access

Smartphone-Captured Chest X-Ray Photographs

Po-Chih Kuo, ChengChe Tsai, Diego M Lopez, Alexandros Karargyris, Tom Pollard, Alistair Johnson, Leo Anthony Celi

Smartphone-captured CXR images including photographs taken from MIMIC-CXR and CheXpert, photographs taken by resident doctors, and photographs taken with different devices.

smartphone photograph cxr

Published: Sept. 27, 2020. Version: 1.0.0


Challenge Credentialed Access

CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays

Gregory Holste, Mingquan Lin, Song Wang, Yiliang Zhou, Yishu Wei, Hao Chen, Atlas Wang, Yifan Peng

CXR-LT 2024 was a challenge for long-tailed, multi-label, and zero-shot thorax disease classification on chest X-rays, held at MICCAI 2024. This page contains long-tailed labels for 45 diseases from the CXR-LT 2024 and 2023 challenges.

disease classification artificial intelligence chest x-ray deep learning computer-aided diagnosis long-tailed learning cardiopulmonary disease zero-shot learning

Published: March 19, 2025. Version: 2.0.0