Database Credentialed Access
RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports
Published: Sept. 12, 2025. Version: 1.0.0
When using this resource, please cite:
(show more options)
Delbrouck, J. (2025). RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/j8e7-pr22
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
Radiology reports are essential for clinical care but pose challenges for automated processing due to their unstructured nature. Existing datasets like RadGraph-1.0 focus narrowly on chest X-rays (CXR), limiting their applicability. We introduce RadGraph-XL, a large-scale, expert-annotated dataset of 2,300 radiology reports with over 410,000 labeled entities and relations, spanning four anatomy-modality pairs: chest computed tomography (CT), abdomen/pelvis CT, brain magnetic resonance imaging (MR), and CXR.
Each report is annotated by board-certified radiologists using a detailed schema that captures observations, anatomical references, and their relationships. A novel post-processing step identifies measurement-related entities, a clinically valuable category. Trained models using RadGraph-XL outperform prior methods and GPT-4, and generalize well to out-of-domain data such as deep vein thrombosis (DVT) ultrasound reports.
RadGraph-XL is released publicly with models and annotations to support applications in clinical natural language processing (NLP), medical imaging artificial intelligence, and foundation model evaluation, setting a new benchmark for structured information extraction in radiology.
Background
Traditionally, extracting structured data from radiology reports has been difficult because these reports are written in free-text form and contain specialized medical terminology. Prior efforts, such as RadGraph-1.0 [1], focused on chest X-ray (CXR) data and provided a valuable framework for labeling clinical entities (e.g., anatomies and observations) and their relationships (e.g., "located at," "modify," "suggestive of"). However, RadGraph-1.0 was limited to just one imaging modality (CXR) and thus could not meet the increasing need for fine-grained, structured information across a broader range of anatomical regions and imaging techniques.
With radiology research expanding to diverse modalities such as computed tomography (CT) and magnetic resonance imaging (MRI), and anatomies such as chest, abdomen/pelvis, and brain, and with tasks like clinical monitoring, disease tracking, and artificial intelligence (AI)-driven image analysis relying on richer annotations, there was a clear gap. RadGraph-XL was therefore created to provide a large-scale, expert-labeled dataset encompassing multiple anatomy-modality pairs (chest CT, abdomen/pelvis CT, brain MRI, and CXR). By significantly increasing the data volume and complexity of annotations, RadGraph-XL seeks to advance automated radiology report analysis, facilitate improved model performance, and enable new research on measurement extraction and structured information retrieval.
Methods
Report Selection
A total of 2,300 radiology reports were curated from two large institutional sources: Medical Information Mart for Intensive Care Chest X-ray (MIMIC-CXR) and Stanford Health Care. Rather than using all available reports, we employed a targeted sampling strategy to ensure clinical diversity and semantic coverage across different imaging contexts. Specifically, we focused on four modality–anatomy pairs:
- Chest computed tomography (CT)
- Abdomen/Pelvis CT
- Brain magnetic resonance imaging (MRI)
- Chest X-rays (CXR)
The report selection process involved three steps:
- Condition coverage: Reports were filtered to ensure a balanced representation of disease types and imaging findings.
- Semantic clustering: We used Universal Sentence Encoder (USE) embeddings and t-distributed stochastic neighbor embedding (t-SNE) projection to cluster the reports based on content similarity, and sampled uniformly across clusters to maintain topical diversity.
- Length stratification: Reports were grouped by sentence length and sampled proportionally to represent both concise and complex narratives.
This approach ensured that the dataset includes a wide range of diagnostic content, report styles, and anatomical references.
Annotation Schema
We adopted and extended the RadGraph-1.0 schema, which defines entities and relations within the text of radiology reports. Each entity is a span of text labeled according to both clinical type and certainty:
Entity Labels:
- Anatomy: Definitely Present – A body structure that is clearly visible or referenced in the image (e.g., "right lung").
- Anatomy: Definitely Absent – A body structure that is noted as missing or removed (e.g., "absent gallbladder").
- Anatomy: Uncertain – Unclear presence or visibility of a body part (e.g., "possible adrenal gland").
- Observation: Definitely Present – A radiologic finding, diagnosis, or visual feature confidently stated (e.g., "pleural effusion").
- Observation: Definitely Absent – A finding that is explicitly negated (e.g., "no pneumothorax").
- Observation: Uncertain – Findings described with uncertainty or ambiguity (e.g., "could represent a mass").
In addition, we introduced a post-processing step to detect measurement-related entities, such as "4.6 cm" or "less than 6 mm", which are important for quantitative assessment in radiology.
Relation Labels:
- Modify – One entity changes or qualifies another (e.g., "small mass" where "small" modifies "mass").
- Located At – An observation is associated with an anatomical site (e.g., "effusion" located at "left pleural space").
- Suggestive Of – One observation implies another (e.g., "consolidation" suggestive of "pneumonia").
Expert Review Process
Each report was double-annotated by board-certified radiologists, with a required minimum inter-annotator agreement rate of 50%. Disagreements were reviewed and resolved by an adjudicating radiologist. This rigorous process resulted in 406,141 validated annotations covering a broad and balanced distribution of entities and relations.
Data Description
Scope and Coverage
The dataset includes 2,300 expert-annotated radiology reports across four modality–anatomy pairs:
- Chest computed tomography (CT)
- Abdomen/Pelvis CT
- Brain magnetic resonance imaging (MRI)
- Chest X-ray (CXR)
Reports originate from two institutions, MIMIC-CXR and Stanford, to ensure stylistic and clinical diversity. This repository release contains only the MIMIC subset.
Dataset Structure and Variables
Each report is annotated with:
- Entities
- Anatomy: present, absent, or uncertain
- Observation: present, absent, or uncertain
- Measurements: spans expressing sizes or dimensions (e.g., "5 mm", "2.5 × 1.5 cm")
- Relations
- modify
- located at
- suggestive of
Annotations are stored in structured formats (e.g., JSON), containing:
- Full report text
- List of entities (text span, label, type)
- List of relations (source entity, target entity, type)
Descriptive Statistics
Statistic | Value |
---|---|
Total Reports | 2,300 |
Institutions | MIMIC-CXR, Stanford |
Modality–Anatomy Pairs | 4 (Chest CT, Abdomen/Pelvis CT, Brain MR, Chest X-ray) |
Average Report Length | ~410 words |
Length Range (min–max) | ~100 to 600+ words |
Total Entities | 226,563 |
— Anatomy Entities | 113,121 |
— Observation Entities | 113,442 |
Entity Type Breakdown | |
— Definitely Present | 82,522 (observations), 113,121 (anatomy) |
— Definitely Absent | 22,882 (obs.), 4 (anat.) |
— Uncertain | 8,038 (obs.), 3 (anat.) |
Total Relations | 179,578 |
— Modify | 113,679 (63.3%) |
— Located At | 59,154 (32.9%) |
— Suggestive Of | 6,745 (3.8%) |
Measurements | 3,297 entities annotated post hoc |
— Most common in** Abd/Pelvis CT | 1,421 mentions |
Unique Entity Types | 19,772 combinations of text and label |
Unique Relation Triplets | 67,323 (source entity, target entity, label) |
Average Agreement (double annotation) | ≥ 50% across all modalities |
Data Splits
For reproducible experiments and fair comparisons, the dataset is divided into:
- Training set: 2,320 reports
- Validation set: 290 reports
- Test set: 290 reports (used for official benchmarking)
Use Cases
This dataset is particularly suited for:
- Clinical Named Entity Recognition (NER)
- Relation extraction between anatomical and pathological concepts
- Developing and evaluating medical information extraction pipelines
- Benchmarking generalist or task-specific language models in healthcare NLP
- Exploring cross-modality generalization
Sample
{
"dataset": "mimic-chest-ct",
"doc_key": 0,
"sentences": [
["STUDY", ":", "CT", "torso", ".", "HISTORY", ":", "Metastatic", "breast", "cancer", "..."]
],
"ner": [
[ [77, 77, "Anatomy::definitely present"],
[78, 78, "Observation::definitely present"],
...
]
],
"relations": [
[ [78, 78, 77, 77, "located_at"],
[84, 84, 83, 83, "modify"],
...
]
]
}
Usage Notes
Potential Applications
RadGraph-XL is a versatile resource designed to advance clinical natural language processing (NLP) in radiology. Key applications include:
- Entity and Relation Extraction: Training models to identify and link anatomical structures, clinical observations, and measurements in radiology reports.
- Measurement Understanding: Specialized support for analyzing size and length descriptors (e.g., "2.5 centimeters (cm)", "less than 6 millimeters (mm)"), which are critical for monitoring disease progression.
- Structured Reporting: Enabling automated conversion of free-text radiology findings into structured data formats to support electronic medical record (EMR) integration and clinical workflows.
- Summarization & Report Generation: Facilitating high-quality summarization and generation of radiology findings from raw text or imaging data.
- Model Evaluation: Providing a standardized benchmark for comparing clinical NLP models, including domain-specific transformer architectures and large language models (LLMs).
- Clinical Decision Support: Laying the foundation for downstream tasks like diagnostic assistance and patient trajectory analysis.
Resources and Tooling
The official RadGraph-XL GitHub repository provides:
- Downloadable data files, annotation schema, and data splits.
- Pretrained DyGIE++ (Biomedical Vision-Language Pretraining Chest X-ray Bidirectional Encoder Representations from Transformers, BiomedVLP-CXR-BERT) [2] and Span-based Entity and Relation Transformer (SpERT) [3] models for entity–relation extraction.
- A detailed model card describing training parameters, performance metrics, and usage instructions.
Python utilities for:
- Parsing and preprocessing the
.jsonl
data - Evaluating predictions using official metrics
- Visualizing entity–relation graphs
Software Requirements:
- Python ≥ 3.7
- PyTorch ≥ 1.7
- Hugging Face Transformers library
These tools enable rapid experimentation and seamless integration into research pipelines.
Known Limitations
While RadGraph-XL introduces several innovations, users should be aware of the following limitations:
- Limited modality coverage: Only four modality–anatomy pairs are covered; modalities like ultrasound, mammography, or positron emission tomography (PET) are not included.
- Institutional bias: Although sourced from two institutions (Medical Information Mart for Intensive Care Chest X-ray, MIMIC-CXR, and Stanford Health Care), institutional language and documentation style may not generalize globally.
- Measurement handling: Measurement entities were added via post-processing and not manually validated in the same way as core annotations, which may introduce minor inconsistencies.
- Annotation agreement: Inter-rater agreement was ≥50%, indicating variability in complex cases despite expert review.
- Data availability: Only the MIMIC subset is publicly released due to data-sharing restrictions.
Ethics
RadGraph-XL enables structured data extraction from radiology reports, supporting medical AI and clinical research. The dataset is de-identified and IRB-approved, ensuring patient privacy. While our models improve information retrieval, potential biases and misinterpretations must be carefully monitored. We encourage fairness audits across demographics. RadGraph-XL is released for research purposes and should not be used in clinical care without further validation.
Conflicts of Interest
We do not report any financial or personal relationships that could be construed as conflicts of interest. The dataset is made publicly available for non-commercial research purposes, and all contributing institutions have approved its release under these terms.
References
- Jain S, Agrawal A, Saporta A, Truong S, Duong DN, Bui T, Chambon P, Zhang Y, Lungren MP, Ng AY, Langlotz C, Rajpurkar P. RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. In: Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). 2021.
- Wadden D, Wennberg U, Luan Y, Hajishirzi H. Entity, Relation, and Event Extraction with Contextualized Span Representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). p. 5784–9. 2019.
- Eberts M, Ulges A. Span-based joint entity and relation extraction with transformer pre-training. In: Proceedings of the European Conference on Artificial Intelligence (ECAI 2020). IOS Press; 2020. p. 2006–13.
Parent Projects
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/j8e7-pr22
DOI (latest version):
https://doi.org/10.13026/6tw7-rq96
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project