Name: InReDD-Dataset-PAN924
Published: Nov. 22, 2025
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Contributor Review

Caio Uehara Martins , Camila Tirapelli , Hugo Gaêta-Araujo , Jose Augusto Baranauskas , Breno Zancan , Jose Carneiro , Alessandra Macedo

Published: Nov. 22, 2025. Version: 1.0.0

When using this resource, please cite: (show more options)
Uehara Martins, C., Tirapelli, C., Gaêta-Araujo, H., Baranauskas, J. A., Zancan, B., Carneiro, J., & Macedo, A. (2025). InReDD-Dataset-PAN924 (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/r5nt-we67

MLA	Uehara Martins, Caio, et al. "InReDD-Dataset-PAN924" (version 1.0.0). PhysioNet (2025). RRID:SCR_007345. https://doi.org/10.13026/r5nt-we67
APA	Uehara Martins, C., Tirapelli, C., Gaêta-Araujo, H., Baranauskas, J. A., Zancan, B., Carneiro, J., & Macedo, A. (2025). InReDD-Dataset-PAN924 (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/r5nt-we67
Chicago	Uehara Martins, Caio, Tirapelli, Camila, Gaêta-Araujo, Hugo, Baranauskas, Jose Augusto, Zancan, Breno, Carneiro, Jose, and Alessandra Macedo. "InReDD-Dataset-PAN924" (version 1.0.0). PhysioNet (2025). RRID:SCR_007345. https://doi.org/10.13026/r5nt-we67
Harvard	Uehara Martins, C., Tirapelli, C., Gaêta-Araujo, H., Baranauskas, J. A., Zancan, B., Carneiro, J., and Macedo, A. (2025) 'InReDD-Dataset-PAN924' (version 1.0.0), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/r5nt-we67
Vancouver	Uehara Martins C, Tirapelli C, Gaêta-Araujo H, Baranauskas J A, Zancan B, Carneiro J, Macedo A. InReDD-Dataset-PAN924 (version 1.0.0). PhysioNet. 2025. RRID:SCR_007345. Available from: https://doi.org/10.13026/r5nt-we67

Additionally, please cite the original publication:

Eliana Dantas Costa, Hugo Gaêta-Araujo, José Andery Carneiro, Breno Augusto Guerra Zancan, José Augusto Baranauskas, Alessandra Alaniz Macedo, Camila Tirapelli, Development of a dental digital data set for research in artificial intelligence: the importance of labeling performed by radiologists, Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology, Volume 138, Issue 1, 2024, Pages 205-213, ISSN 2212-4403, https://doi.org/10.1016/j.oooo.2023.12.006.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

InReDD-Dataset-PAN924 is a collection of 924 radiographic images annotated with mouth and teeth labels by specialists from the InReDD research group.

InReDD (Interdisciplinary Research Group in Digital Dentistry) is a collaborative research initiative at the University of São Paulo’s Ribeirão Preto Campus (USP-RP), uniting the Department of Computation and Mathematics (DCM-USP-RP) and the School of Dentistry of Ribeirão Preto (FORP-USP-RP). The group is dedicated to developing applied technologies for the field of Odontology.

In this context, the InReDD-Dataset-PAN924 is an image collection from the field of Odontology. It was developed to support descriptive analyses and to facilitate the creation and validation of artificial intelligence models. The data were collected primarily through clinical work at FORP-RP.

This manuscript draws upon a previously published work, “Development of a dental digital dataset for research in artificial intelligence: the importance of labeling performed by radiologists.” However, certain details have been adjusted or updated to account for temporal adaptations and contextual revisions. As a result, portions of the content may not correspond verbatim to the original publication, although the scientific essence and core contributions remain preserved.

Background

Radiographic imaging is a cornerstone of dental diagnostics and treatment planning. In recent years, Artificial Intelligence (AI) has emerged as a powerful auxiliary tool for interpreting these images, showing promise in detecting caries, periodontal disease, and other oral pathologies [1]. The predominant approach, supervised learning, requires training models on large sets of accurately labeled data [2], which are often a significant bottleneck in medical AI development [3]. The reliability of any AI model is fundamentally dependent on the quality of this "ground truth," the reference standard used for training and validation.

This work is part of a broader effort to create an automated solution for the SB Brasil survey, enhancing the "Brasil Sorridente" program, which aims to classify the oral health status of the Brazilian population [4]. To address the need for high-quality training data, we introduce a new dataset with three unique characteristics.

First, it is composed of panoramic radiographs, a common imaging modality in clinical practice. Second, the dataset represents a Brazilian population sample from the School of Dentistry (FORP) in Ribeirão Preto, São Paulo. Third, and most critically, the ground truth was established by a consensus of experienced radiologists, ensuring a high-quality, reliable reference standard for model training.

Making accurately labeled, heterogeneous datasets publicly available is crucial for advancing the field. This dataset provides a valuable resource for the research community to develop, test, and optimize new AI models.

Methods

All annotations were managed and performed using LyriaPACS [5], a web-based platform connected to the I-Medsys image server [6]. I-Medsys customized the PACS to support the research, providing features such as individual work areas with personal keywords, trackable access to the system, and a checklist for image annotation. These features ensured a blinded workflow, data security, and an organized annotation process.

Annotations were conducted by radiologists in a dimly lit room using a monitor with 1024 × 768 resolution and 24-bit color depth. The built-in enhancement tools of the Lyria software (i.e., zoom, brightness, and contrast) were adjusted by each radiologist as needed to assist in the diagnostic task.

For labeling, three dentomaxillofacial radiologists, each with a decade of experience, were involved in the evaluation and labeling of the radiographic images. The process was divided into two distinct tasks:

Labeling: This involved numbering each tooth, which was considered an immutable attribute. For example, tooth 12 will always be identified as tooth 12.
Annotation: This task focused on indicating the condition of a tooth, which is a changeable attribute. For instance, tooth 12 might have decay in one radiograph, be healthy in another, or be an implant in a third.

To ensure accuracy and avoid bias in the AI training data, a forced consensus methodology was used:

One radiologist individually labeled and annotated all panoramic radiographs.
A second radiologist then independently reviewed this work.
Any disagreements between the first two radiologists were resolved by consulting a third radiologist, whose decision established the final consensus.

This consensus became the ground truth for each identified tooth. Further details are provided in the referenced article.

The labeling process followed the Federation Dentaire Internationale (FDI) tooth numbering system and proceeded in a clockwise sequence around the dental arch, starting with the right maxillary molars and ending with the right mandibular molars. A separate JSON file was generated for each annotation.

Data Description

The collection contains 924 anonymized panoramic dental radiographs, designed to support research in digital dentistry.

**Summary of dataset**
Item	Count
Images
Total images	924
– Subset of images labeled with teeth and mouth (polyline) (1)	924
– Subset of images labeled with teeth segmentation (polyline) (2)	200
Annotations
Total rectangle box annotations (1) [924 mouth, 20 033 teeth]	20 957
Total teeth segmentation masks (2)	4 621
Categories of tooth conditions	14
Categories of mouth conditions	4
Categories of tooth positions (FDI)	32

Key Features

Image resolution: 2903 × 1536 px (95 dpi), stored as JPG files.
Annotations: Provided in a COCO-compatible JSON format with dental-specific fields (split and combined).
Tooth-level labels: Includes bounding boxes and hierarchical condition annotations.
Metadata: Contains patient details (age and sex).

Dataset Statistics

Age distribution: Patients range from 14 to 81 years old, with a median age of 35 years.
Gender distribution:
- Female: ~60%
- Male: ~40%
Tooth conditions:
- Healthy teeth: ~45%
- Restored teeth: ~25%
- Caries: ~15%
- Other conditions (e.g., implants, residual roots): ~15%

Annotations

Annotations are distributed as JSON files in a COCO-compatible format, while preserving dental-specific fields.

Labels are provided in two versions:

Split format: The teeth_fdi_labels and mouth_and_teeth_labels directories contain one JSON file per image, with annotations specific to that image.
Combined format: The teeth_fdi_labels.json and mouth_and_teeth_labels.json files contain all annotations combined into a single JSON file for the entire dataset.

teeth_fdi_labels contains teeth segmentation masks and bounding box positions with FDI labels.

mouth_and_teeth_labels contains mouth and teeth positions based on rectangular segmentation with corresponding condition labels.

Each annotation can include:

1. Position:

bbox: Defined for teeth and mouth regions.
segmentation: Defined for teeth and mouth regions.

2. Instance:

Tooth-level labels: Following the FDI numbering system (00–88).
Condition annotations: Binary flags for 12 common findings (e.g., caries, crown, implant, root canal treatment).

Images ("images")

Each image entry contains metadata about the image:

Field	Type	Description
`id`	int	Unique identifier for the image.
`license`	int	License ID for the image.
`file_name`	string	Name of the image file (e.g., `2-F-70.jpg`).
`height`	int	Height of the image in pixels.
`width`	int	Width of the image in pixels.
`sex`	string	Patient's sex (`M` for male, `F` for female).
`age`	string	Patient's age (e.g., `70`).

file_name contains ID, sex, and age; however, specific fields were created for better usability. IDs are random and provided after the anonymization process. They do not follow a logical order because some raw data did not meet quality standards and were removed from the dataset.

Labels ("annotations")

Each annotation entry contains information about a specific object position (e.g., tooth or mouth region) in the image and includes a category_id representing the label (e.g., Mouth Edentulous or Teeth Implant).

Field	Type	Description
`id`	int	Unique identifier for the annotation.
`image_id`	int	ID of the image this annotation belongs to.
`category_id`	int	ID of the category.
`bbox [op]`	list[list]	Bounding box coordinates for the detection.
`segmentation [op]`	list[list]	Polygon coordinates for the segmentation mask.

bbox and segmentation follow COCO standards. Positions are stored in bounding box or segmentation formats:

BBOX: Defined as [x, y, width, height], where x and y specify the top-left corner of the bounding box in pixel coordinates, and width and height represent its size in pixels.
SEGMENTATION: Defined as a list of polygon points [[x1, y1, x2, y2, …, xn, yn]] outlining the object mask. Each pair of values represents the x and y coordinates of a vertex in the 2D image space.

Categories ("categories")

The categories field defines the possible classes for annotations:

Field	Type	Description
`id`	int	Unique identifier for the category.
`name`	string	Name of the category (e.g., `Ed`).
`supercategory`	string / None	Higher-level grouping for the category (can be none).

supercategory was used only in the mouth_and_teeth_labels split, grouping categories as Mouth and Teeth (Artificial, Natural, or Mixed) superclasses.

Tooth/Mouth Condition Categories

Major Bounding Box (Mouth Labels):
- Ed: Edentulous
- De: Dentate
- Me: Maxilla edentulous
- Mne: Mandible edentulous
Minor Bounding Box (Teeth Labels):
- Artificial Teeth (DA):
  - Im: Implant
  - Cp: Single prosthetic crown
  - P: Pontic
- Natural Teeth (DN):
  - H: Healthy
  - Rr: Residual root
  - M3i: Impacted third molar
  - M3f: Developing third molar
  - Te: Endodontic treatment
  - Ri: Intraradicular post
  - Dc: Crown destruction
  - Di: Incisal wear
  - C: Caries
  - R: Restored
  - I: Impacted
- Mixed Teeth (DM):
  - TeM: Endodontic treatment
  - RiM: Intraradicular post
  - CpuM: Single prosthetic crown

Observations

Tooth-level bounding boxes: Following the FDI two-digit numbering system (00–88).
Condition annotations: Binary flags for 12 common findings (e.g., caries, crown, implant, root canal treatment, periapical lesion).

Usage Notes

Annotations are distributed as JSON files in a COCO-compatible format [7], while preserving dental-specific fields. We particularly recommend using FiftyOne as a tool for organizing and exploring the dataset. Two Python scripts are provided: one demonstrates how to load the data in FiftyOne, and the other generates dataset statistics.

You can use these files to load the images and map the image IDs of each annotation to the corresponding annotation JSON. The annotation schema allows you to access all label information, where the BBOX and SEGMENTATION fields represent points in the 2D image space.

The images were converted to a 16-bit format using lossless compression, which reduces file size without compromising image quality. This format, standard in the clinic’s image bank and widely adopted in medical imaging, ensures both efficiency and fidelity. All images were anonymized, coded, and saved as lossless JPEGs with an original resolution of 2903 × 1536 pixels at 300 dpi, later reduced to 90 dpi for subsequent processing.

Note that the dataset represents a demographic sample of patients who visited the School of Dentistry and therefore predominantly includes individuals from Ribeirão Preto, São Paulo, Brazil.

Release Notes

Version 1.0.0

This version represents the initial set of images, containing subset splits of labels. Further work intends to add additional labels to these images and include more images in the collection.

In a broader view of our dataset development plan, we intend to add other types of files that represent additional modalities of information (e.g., 3D intraoral scans, CBCT image files, patient anamnesis text, etc.), which will encompass this image collection and other modality collections to create a multimodal, general dataset.

Ethics

The use of the images was submitted to and approved by the Ethics Committee (Plataforma Brasil, CAAE: 51238021.2.0000.5419).

Acknowledgements

We acknowledge the University of São Paulo’s Ribeirão Preto Campus (USP-RP), the Department of Computation and Mathematics (DCM-USP/RP), the Faculty of Philosophy, Sciences and Letters at Ribeirão Preto, and the School of Dentistry of Ribeirão Preto (FORP-USP/RP) for providing the infrastructure and environment that supported this research.

We also thank the São Paulo Research Foundation (FAPESP), the Innovation Agency USP (AUSPIN), and the USP Unified Scholarship Program (PUB) for funding this research.

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

Pauwels R. A brief introduction to concepts and applications of artificial intelligence in dental imaging. Oral Radiol. 2021;37:153-60.
Panetta K, Rajendran R, Ramesh A, et al. Tufts Dental Database: a multimodal panoramic X-Ray dataset for benchmarking diagnostic systems. IEEE J Biomed Health Inform. 2022;26:1650-9.
Jader G, Fontineli J, Ruiz M, et al. Deep instance segmentation of teeth in panoramic x-ray images. 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). New York: IEEE; 2018. p. 400-7.
Ministério da Saúde (Brasil). Brasil Sorridente [Internet]. Brasília (DF): Ministério da Saúde; 2025 [cited 2025 Nov 6]. Available from: https://www.gov.br/saude/pt-br/composicao/saps/brasil-sorridente
Carvalho DF, Camacho-Guerrero JA, Marques PM, Macedo AA. Lyria PACS: a case study saves ten million dollars in a Brazilian hospital. In: 28th IEEE International Symposium on Computer-Based Medical Systems; 2015; São Carlos, Brazil. p. 326–9. doi: 10.1109/CBMS.2015.87.
I-medsys. Lyria PACS RT [Internet]. Ribeirão Preto (BR): I-medsys; [cited 2025 Nov 6]. Available from: https://i-medsys.com/lyriaRTusa.html
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer Vision: ECCV 2014. Cham: Springer; 2014. p. 740–55. (Lecture Notes in Computer Science; vol. 8693). doi: 10.1007/978-3-319-10602-1_48.

Access

Access Policy:
Only credentialed users who sign the DUA can access the files. In addition, users must have individual studies reviewed by the contributor.

License (for files):
PhysioNet Contributor Review Health Data License 1.5.0

Data Use Agreement:
PhysioNet Contributor Review Health Data Use Agreement 1.5.0

Required training:
No training required