Database Open Access

Brain Hemorrhage Extended (BHX): Bounding box extrapolation from thick to thin slice CT images

Eduardo Pontes Reis Felipe Nascimento Mateus Aranha Fernando Mainetti Secol Birajara Machado Marcelo Felix Anouk Stein Edson Amaro

Published: July 29, 2020. Version: 1.1


When using this resource, please cite: (show more options)
Reis, E. P., Nascimento, F., Aranha, M., Mainetti Secol, F., Machado, B., Felix, M., Stein, A., & Amaro, E. (2020). Brain Hemorrhage Extended (BHX): Bounding box extrapolation from thick to thin slice CT images (version 1.1). PhysioNet. https://doi.org/10.13026/9cft-hg92.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

BHX is a public available dataset with bounding box annotations for 5 types of acute hemorrhage as an extension of the qure.ai CQ500 dataset. This dataset intends to provide data resources to help advance hemorrhage detection towards machine learning localization tasks.

Key Points: 

  • BHX contains up to 39,668 bounding boxes in 23,409 images annotated for hemorrhage, out of a total of ~170k images from qure.ai CQ500 dataset. 

  • The dataset was built through an efficient method to obtain automatic annotated images (thin slices) from sparse initial labeling (thick slices).

  • We hope this dataset can contribute to further develop and validate machine learning algorithms, ultimately helping to diagnose intracranial hemorrhage.


Background

Intracranial hemorrhage is an emergency condition with high personal and societal impact, having 40% mortality within 1 month [1,2,3]. Early diagnosis, often through neuroimaging, is essential to guide the emergency treatment and in most of the cases to reveal the underlying cause [2,3,4]. Head computed tomography (CT) is the modality of choice to evaluate intracranial hemorrhage [2], and prompt assessment of the images by an expert is required. In this context, a tool for diagnosing, locating, and segmenting intracranial hemorrhage can be helpful in clinical head CT exams, resulting in a fast assessment that can guide emergency treatment, ultimately resulting in better outcomes.

Convolutional neural networks (a deep learning technique) are considered the gold standard for image pattern recognition and are able to perform hemorrhage detection, localization and segmentation tasks as shown in previous works. [5, 6, 3, 7]. However the dataset used for these developments are proprietary, thus reproducibility by the scientific community members is not possible.

The Radiological Society of North America (RSNA) recently released a brain hemorrhage detection competition [8], making publicly available the largest brain hemorrhage dataset to date, however the precise hemorrhage location is not delimited in each image, and the exams do not use thin slices series.

Deep learning requires a large amount of data for training, and although intracranial hemorrhage classification datasets have been released, no public datasets are available for the bleeding localization task with bounding boxes annotation [6,8].  Detailed annotation of images is tiring, repetitive, and requires long hours of expert dedication, thus labelled thin-slices are rare in the open research community.

In this work we present a publicly available head CT database with bounding box annotations of 6 different types of hemorrhage. Moreover, we developed an efficient method to expedite feature annotation in thin slices through automatic extrapolation from sparse hand-drawn bounding boxes.


Methods

The images were obtained from the publicly available dataset CQ500 by qure.ai for critical findings on head CT scans. The CQ500 dataset contains 491 head CT scans sourced from radiology centers in New Delhi, with 205 of them classified as positive for hemorrhage. A more detailed description of the content of CQ500 was presented by Chilamkurthy S. et al. [6]. One of the authors (A.S.) served as a consultant for the annotation tool MD.ai, used in the annotation process described below. 

Labeling:

We created one label for each of the 5 different types of acute hemorrhage: Intraparenchymal, Subarachnoid, Intraventricular, Epidural, Subdural, and an extra sixth label for Chronic Subdural Hematoma. Thus, the other subdural label could refer to Acute Subdural Hematoma (herein referred to as Acute Subdural) [9, 10].

Three trained neuroradiologists with different levels of clinical experience, respectively six (F.N.), four (M.A.), and less than one year (E.R.) of practice, read and annotated the images from the thick-slices series.

All head CT images were evaluated using a soft tissue filter [11]. We also selected the corresponding series containing 3 mm thick slices or more (referenced herein as "thick-slices" or "thick-series") and the corresponding series containing 1 mm thick slices or less (referenced herein as "thin-slices" or "thin-series"). Then, we matched the thick-slice image series to the thin-slice ones, by using the "Image Position (patient)" DICOM tag [12].

Check the image file "example_annotated_images.png" to see examples of the annotations for each one of the classes.


Data Description

The manual labeling resulted in 6,283 bounding boxes in 3,558 different images. The extrapolation for all other the thin slices resulted in 39,668 bounding boxes in 23,409 images.

The dataset is available in three versions: i) the original hand-drawn annotations of the thick slices (file “1_Initial_Manual_Labeling.csv”); ii) the extrapolation of the annotations for all other correspondent images (file “2_Extrapolation_to_All_Series.csv”); and iii) the extrapolation of the annotations only for the selected soft-tissue thin slices (file “3_Extrapolation_to_Selected_Series.csv”). This clean version is an attempt to mitigate inconsistencies between different CT acquisitions, related to minor movements during the examination. We also remove the bone filtered images, which are suboptimal for hemorrhage assessment. The clean dataset contains 27,203 bounding box on 15,979 images.

Datasets:

  • Version 1 (file "1_Initial_Manual_Labeling.csv") contains only the hand-drawn annotations of the thick-slices.
  • Version 2 (file "2_Extrapolation_to_All_Series.csv") contains the thick-slices hand-drawn annotations as well as the extrapolation for all other corresponding images, regardless of whether they are bone filter or series with contrast.
  • Version 3 (file "3_Extrapolation_to_Selected_Series.csv") contains the thick-slices hand-drawn annotations as well as the extrapolation for the selected soft-tissue non-contrast-enhanced series only.

Columns:

  • SOPInstanceUID: The unique identifier for each DICOM Image of the dataset. This ID is used to link the annotations with the image in the original CQ-500 dataset. Each annotated image may contain any number of annotations, therefore one or more lines with the same "SOP Instance UID". 
  • SeriesInstanceUID:The unique identifier for each DICOM Series. It helps to group the images from the same series.
  • StudyInstanceUID: The unique identifier for each CT scan. It helps to group the images or series from the same study.
  • data: The bounding box coordinates - X, Y, width, height.
  • labelName: The label name for one of the six types of hemorrhage described above.
  • labelType: It tells whether the image comes from a thick-slices series used for manual annotation ("thick-slices"), from a selected thin-slices series with soft tissue filter, composing the dataset 3 ("thin-slices") or from any other series ("other").

The DICOM tag SOP Instance UID links the annotations with the images of the original "CQ-500" dataset. The SOP Instance UID DICOM Tag is a unique identifier assigned to the DICOM images when acquiring the exam or when the images are anonymized, remaining unique for a unique DICOM image. The use of this unique identifier mitigates the risk of confusion between names of the DICOM image files from different series and exams that often contain the same name. To obtain this identifier, you must access the DICOM Tag 0008,0018 - SOP Instance UID in the DICOM header.

The original images are hosted by qure.ai at http://headctstudy.qure.ai/dataset, the original dataset was published in the reference [6]. There is no identification of a version number.


Usage Notes

This dataset contains over thirty-nine thousand bounding boxes of 5 types of acute intracranial hemorrhage. The bounding boxes add a unique value to the public Head CT dataset qure.ai CQ500, by providing additional detailed data resources for the development of algorithms by the research community, also serving as a benchmark for hemorrhage detection and localization tasks.

A limitation of this work includes the occasional imperfect match of x and y values when extrapolating the bounding boxes from the thicker series to other series. Automatically generated bounding boxes were randomly spot-checked by one neuroradiologist. However, no corrections were made on them. 

In order to mitigate this limitation, future work could use the thick-slice annotations (manual) and match the nearest z position for the initial slices and then interpolate the boxes in between with step-wise size changes to allow for smooth transitions.

You can examine the images and annotations at https://public.md.ai/annotator/project/Y2qr6vqv/workspace, selecting BrainHemX (BHX) group of labels.


Acknowledgements

We are especially grateful to qure.ai team for publishing CQ500 dataset and to MD.ai team for collaborating and providing technical support with the annotation tool.


Conflicts of Interest

The author AS is employed by MD.ai, which provided the annotation tool for the project.


References

  1. van Asch C, Luitse M, Rinkel G, van der Tweel I, Algra A, Klijn C. Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis. The Lancet Neurology. 2010;9(2):167-176.
  2. Heit J, Iv M, Wintermark M. Imaging of Intracranial Hemorrhage. Journal of Stroke. 2017;19(1):11-27.
  3. Chang P, Kuoy E, Grinband J, Weinberg B, Thompson M, Homo R et al. Hybrid 3D/2D Convolutional Neural Network for Hemorrhage Evaluation on Head CT. American Journal of Neuroradiology. 2018;39(9):1609-1616.
  4. Goldstein J, Gilson A. Critical Care Management of Acute Intracerebral Hemorrhage. Current Treatment Options in Neurology. 2011;13(2):204-216.
  5. Prevedello L, Erdal B, Ryu J, Little K, Demirer M, Qian S et al. Automated Critical Test Findings Identification and Online Notification System Using Artificial Intelligence in Imaging. Radiology. 2017;285(3):923-931.
  6. Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau N, Venugopal V et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. The Lancet. 2018;392(10162):2388-2396.
  7. Kuo W, Hӓne C, Mukherjee P, Malik J, Yuh E. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proceedings of the National Academy of Sciences. 2019;:201908021.
  8. RSNA Intracranial Hemorrhage Detection. Kaggle Web site. https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection. Published September 18, 2019. Accessed October 25, 2019.
  9. Mirza S, Gokhale S. Neuroimaging in Intracerebral Hemorrhage. Hemorrhagic Stroke - An Update. Intechopen Web site. https://www.intechopen.com/books/hemorrhagic-stroke-an-update/neuroimaging-in-intracerebral-hemorrhage. Published October 4, 2017. Accessed October 25, 2019.
  10. Osborn A, Hedlund G, Salzman K. Osborn's brain. Philadelphia, PA: Elsevier; 2018.
  11. Weiss K, Cornelius R, Greeley A, Sun D, Chang I, Boyce W et al. Hybrid Convolution Kernel: Optimized CT of the Head, Neck, and Spine. American Journal of Roentgenology. 2011;196(2):403-406.
  12. DICOM Standard. Dicomstandard.org Web site. https://www.dicomstandard.org. Accessed October 25, 2019.

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-ShareAlike 4.0 International Public License

Discovery
Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 22.3 MB.

Access the files
Folder Navigation: <base>
Name Size Modified
1_Initial_Manual_Labeling.csv (download) 1.6 MB 2020-07-21
2_Extrapolation_to_All_Series.csv (download) 10.5 MB 2020-07-21
3_Extrapolation_to_Selected_Series.csv (download) 7.3 MB 2020-07-21
LICENSE.txt (download) 16.0 KB 2020-07-29
SHA256SUMS.txt (download) 469 B 2020-07-29
example_annotated_images.png (download) 2.9 MB 2020-07-21