Cervical Spine Fracture Detection from CT Scans
Soujatya Sarkar Kriti Khare
School of Computing and Electrical Engineering School of Computing and Electrical Engineering
Indian Institute of Technology, Mandi Indian Institute of Technology, Mandi
s23106@students.iitmandi.ac.in s23069@students.iitmandi.ac.in
Abstract—For early patient care and medical intervention, the
accurate and rapid diagnosis of cervical spine fractures from
CT scans is essential. In this study, we suggest using cutting-
edge AI methods to improve fracture detection’s effectiveness
and precision. Our primary goal is to apply state-of-the-art
architectures, like Distilled VIT and Vision Transformer with
deformable attention, and optimize for quicker computation
without sacrificing accuracy or higher accuracy with compara-
ble processing resources. Preprocessing CT scan images, using
annotated data to train the AI model, and optimizing the
chosen architecture to focus on cervical spine fracture detection
comprise our methodology. The dataset used in this experimental
project has been collected from (RSNA - RADIOLOGICAL
SOCIETY OF NORTH AMERICA, a featured code competition
held in 2022 for Cervical Spine Fracture Detection). Through
the utilization of deep learning and novel attention mechanisms,
our goal is to create a dependable system that can accurately
and quickly detect fractures. The project’s goal is to develop
a highly effective and precise AI-driven cervical spine fracture
detection system that can expedite diagnosis and enhance patient
outcomes. These developments have the potential to transform
diagnostics for medical imaging, optimize clinical workflows,
and ultimately save lives. The results of this initiative have Fig. 1: The Cervical Spine
the potential to transform medical imaging diagnostics, improve
fracture detection precision, and speed up patient treatment
decisions. With major socioeconomic ramifications, the prevalence of
spinal cord injuries (SCIs) resulting from vertebral fractures is
Index Terms—Fracture detection, CT scans, Vision Trans- a matter of great public health concern. Recent data indicates
former, Deformable attention, Distilled VIT, Medical imaging,
that there are about 17,730 new occurrences of SCI in the
Diagnostic accuracy, Computational efficiency
United States alone each year, with falls and car accidents
causing the majority of these injuries [24]. These injuries not
only cause great physical and psychological suffering for those
I. I NTRODUCTION
who sustain them, but they also place a significant financial
The spinal cord is one of the most important parts of strain on healthcare systems around the globe.
the human nervous system, it is a critical intermediary for There is an urgent need for reliable and effective detection
transmitting sensory and motor information between the brain techniques since prompt and correct diagnosis is crucial to
and specific regions of an individual’s body. This intricate reducing the negative effects of spinal fractures. Innovative
structure, protected by the vertebral column - or spine as it cervical spine fracture identification methods have been made
is most commonly known — consists of a number of bony possible by recent developments in artificial intelligence (AI)
segments: the vertebrae. Third, the vertebral column is itself and medical imaging. These methods can potentially change
divided into independent regions with their anatomical features clinical practice and improve patient outcomes completely.
and made-up functions. This article’s goal is to investigate the use of AI-driven
Comprising the highest portion of the vertebral column is methods for detecting cervical spine fractures from human
The cervical spine which includes seven vertebrae termed C1 spine CT scans. We aim to create a scalable and resilient
to C7. These specific vertebrae serve many valuable func- system that can identify cervical spine fractures early and
tions such as providing support for our head and permitting accurately. This will enable medical professionals to treat
numerous movements like rotation, extension, and flexion. patients promptly and enhance patient outcomes. We will
Figure 1 shows the vital cervical vertebrae that constitute the achieve this by utilizing cutting-edge deep learning algorithms
neck region; these are essential for safeguarding every delicate and cloud computing.
structure of the spinal cord so that they can function well.
II. L ITERATURE R EVIEW the landscape of spinal imaging and fracture detection. Col-
laborative synergies among clinicians, researchers, and tech-
The diagnosis and management of spinal injuries, partic- nologists continue to propel innovation, promising heightened
ularly cervical spine fractures, represent a complex clinical diagnostic accuracy and optimized patient care in the domain
challenge. Recent scholarly endeavors have capitalized on of spinal injury management.
emerging technologies such as deep learning (DL) and cloud-
based computation to augment detection systems, thereby
offering promising avenues for enhancing patient care and III. O BJECTIVES , M ILESTONES , AND D ELIVERABLES
outcomes [1]. A. Objectives
Clinical evidence underscores the pivotal importance of • Develop an AI-driven model specifically designed for the
timely surgical intervention in the management of spinal cord identification of cervical spine fractures using CT scans.
injuries [2]. Nevertheless, accurate and efficient detection of • Utilize a diverse and extensive dataset of annotated CT
spinal fractures remains an ongoing clinical concern. Initia- scans of the human spine to train and evaluate the
tives such as the RSNA 2022 Cervical Spine Fracture Detec- developed model.
tion competition have sought to bridge this gap by fostering • Implement cutting-edge machine learning algorithms,
innovation in fracture detection algorithms [3]. particularly deep learning approaches, to enhance the
Computational methods have emerged as indispensable accuracy and efficiency of fracture detection.
tools in advancing spine imaging, furnishing clinicians with • Compare the performance of the AI-driven model with
robust analytical frameworks for diagnosis and treatment eval- current methods to assess improvements in diagnostic
uation. Noteworthy publications like ”Computational Meth- efficiency and accuracy.
ods and Clinical Applications for Spine Imaging” provide • Evaluate the potential impact of the developed model on
comprehensive insights into the convergence of computational clinical decision-making by measuring the reduction in
techniques and clinical practice [4]. Similarly, benchmark diagnosis time.
datasets like Verse serve as invaluable resources for the devel- • Investigate the potential enhancement of patient outcomes
opment and evaluation of vertebrae labeling and segmentation through early and accurate detection of cervical spine
algorithms [5]. fractures.
A nuanced comprehension of the pathophysiological under- • Contribute to the advancement of AI in medical imaging
pinnings and associated risk factors of vertebral compression by providing radiologists and doctors with a reliable tool
fractures is paramount for refining preventive and therapeu- for early and accurate fracture detection.
tic strategies [6]. Moreover, infections affecting the spinal
B. Milestones and Deliverables
cord and contiguous structures necessitate precise imaging
modalities for accurate diagnosis and effective management The project Milestones have been represented in the form
[7]. Scholarly contributions such as ”Spinal Imaging and of a Gannt Chart in Fig. 2. This timeline shows a sequential
Image Analysis” furnish clinicians and researchers alike with progression of tasks, with overlapping periods where multiple
indispensable knowledge in this domain [8]. tasks are conducted concurrently. This helps in understanding
Contemporary reviews underscore the escalating signifi- the allocation of time and resources for each phase of the
cance of DL in medical imaging, with a particular emphasis on project.
applications pertinent to spinal health [9]. Authoritative works
1. Started Excavating Literature: This task started in
such as the ”Handbook of Medical Image Computing and
Week 1 of April and continued into Week 2 of April.
Computer-Assisted Intervention” furnish exhaustive coverage
of DL methodologies and their manifold applications in med-
2. Dataset Pipelining: We began the work in Week 2 of
ical imaging [10]. Furthermore, iterative fully convolutional
April and continued through Week 3 of April.
neural networks and other DL architectures hold promise
in automating vertebra segmentation and identification tasks,
3. Model Architecture: Started in Week 3 of April and
thereby streamlining clinical workflows [11, 12].
extended till Week 1 of May.
Notwithstanding significant strides, challenges persist in
achieving precise detection of vertebral fractures. Innovative 4. Development: Commenced in Week 2 of May and
approaches such as cortical shell unwrapping and model- continued till Week 3 of May.
based segmentation frameworks endeavor to bolster fracture
detection sensitivity and specificity [13, 14]. Furthermore, 5. Training: Started in Week 3 of May and finished just a
research endeavors in allied fields such as colonography and few days before our project submission into Week 4 of May.
head-and-neck carcinoma detection offer transferable insights
and methodologies to the realm of spinal imaging [15, 16]. 6. Documentation: This task was done from the start of the
last week of May and finished on the 25th of May.
Recent advancements in DL, computational methods, and
evidence-based clinical practice have substantially reshaped
• segmentations/: Pixel-level annotations for a subset of
the training set, provided in the NIfTI file format. A
portion of the imaging datasets have been segmented
automatically using a 3D UNET model, with radiolo-
gists modifying and approving the segmentations. The
provided segmentation labels range from 1 to 7 for C1 to
C7 (seven cervical vertebrae) and 8 to 19 for T1 to T12
(twelve thoracic vertebrae).
Fig. 2: Project Timeline Please note that the NIfTI files contain segmentations in
the sagittal plane, while the DICOM files are in the axial
plane. Utilize the NIfTI header information to ensure
IV. DATASET D ESCRIPTION alignment between DICOM images and segmentations.
The objective of this competition is to discern fractures in
CT scans of the cervical spine (neck) at both the individual
B. Our Dataset
vertebrae level and the overall patient level. Timely detection
and precise localization of any vertebral fractures are crucial Initially, we utilized the 86 segmentation sample studies
to prevent neurological deterioration and paralysis following provided by the organizers. We redefined the mask labels as
trauma. follows:
This competition entails a hidden test dataset. Upon submis- • 0: Background
sion, the actual test data, including a comprehensive sample • 1 to 7: C1 to C7
submission, will be provided to the submitted notebook.
A. Files V. M ETHODOLOGY
• train.csv: Metadata for the training dataset. A. Dataset Preparation
– StudyInstanceUID: The unique study ID for each We utilized a dataset consisting of 86 CT scans of the spines
patient scan. from different patients. The segmentations are stored in the
– patient overall: One of the target columns indicat- NIfTI format (with the extension .nii or .nii.gz , which
ing the patient level outcome, i.e., if any of the is widely used in medical imaging for storing volumetric data.
vertebrae are fractured. The individual slices of the CT scans are in the DICOM
– C[1-7]: The other target columns indicate whether format (.dcm), a standard format for medical imaging data
the given vertebrae is fractured. that ensures compatibility with various imaging devices and
software.
• test.csv: Metadata for the test set prediction structure. For our experiments, we divided the dataset as follows:
Only the initial rows of the test set are accessible for
download. Training and Validation: We allocated 84 CT scan
•
– row id: The row ID. images for training and validation purposes. This subset
– StudyInstanceUID: The study ID. was further split into training and validation sets in a
– prediction type: Indicates which one of the eight 70:14 ratio, resulting in 70 images for training and 14
target columns requires a prediction in this row. images for validation. This split helps ensure that the
model is exposed to a sufficient variety of examples
• [train/test] images/[StudyInstanceUID]/[slice number].dcm: during training while having a distinct validation set to
The image data, organized with one folder per scan. monitor and tune model performance.
Expect approximately 1,500 scans in the hidden test set.
Each image is in DICOM file format with ≤ 1 mm slice • Testing: The remaining 2 CT scan images were set aside
thickness, axial orientation, and bone kernel. Note that for testing. This test set is used to evaluate the model’s
some DICOM files are JPEG compressed. performance on unseen data, providing an unbiased as-
sessment of its generalization capability.
• sample submission.csv: A valid sample submission.
B. Summary of Data Split
– row id: The row ID.
– fractured: The target column.
• Total CT Scans: 86
• train bounding boxes.csv: Bounding boxes for a subset • Training and Validation: 84
of the training set. – Training Set: 70 images
8.dcm, ..., 28.dcm). This resulted in a total of 15 .npy
files.
3) Vertebrae and Slices:
• Each vertebra is represented by 30 slices.
• Considering the cervical vertebrae (C1 through C7), we
have:
– C1 to C7:
30 slices/vertebra × 7 vertebrae = 210 slices
• Only these 210 slices are deemed important for our study;
the rest are discarded.
• From these 210 slices, using our data augmentation
technique, we generate 105 .npy files. This results in
one .npy file for every pair of slices, effectively giving
us data for every second slice.
This data augmentation strategy ensures that we maximize
the use of available data, thereby enhancing the training
process and potentially improving the model’s performance
on segmentation tasks.
D. Model Architecture
For our task, we employed the Vision Transformer (ViT)
as in Fig. 4 due to its strong performance in classification
tasks. The ViT architecture is particularly suitable for handling
image data by utilizing a transformer encoder to process image
patches.
1) Transformer Encoder: We utilized a ViT with 12 trans-
former encoder layers. The key components of our model
Fig. 3: Flow of Data Preparation architecture are as follows:
• Volumetric Patch Embeddings: The input volumetric
data is divided into smaller patches, which are then flat-
– Validation Set: 14 images tened and projected linearly to create patch embeddings.
• Test Set: 2 images • Positional Encoding: Positional encodings are added to
the patch embeddings to retain the spatial information
This approach ensures that our model is trained, validated, within the sequence of patches.
• Transformer Encoder Layers: The sequence of patch
and tested on appropriately split data, enabling reliable per-
formance evaluation and potential for generalization to new embeddings, augmented with positional encodings, is
data. fed into the transformer encoder. Each encoder layer
consists of multiple layers of multi-head self-attention
C. Data Augmentation mechanisms and feed-forward neural networks.
• Classification Token (CLS Token): A special learnable
We applied data augmentation techniques to the CT scan
classification token (CLS token) is prepended to the
slices to enhance the dataset and improve model robustness.
sequence of patch embeddings. The final hidden state
The process for creating augmented data is as follows:
corresponding to this CLS token is used as the repre-
sentation for the entire image, which is then utilized for
1) Segmentation Data: the classification task.
• For segmentation files in NIfTI format (e.g., 2.nii), we
created a single .npy file with 6 channels. This file is
2) Classification Output: The model outputs an array of
named 2.npy.
length 7, corresponding to the presence or absence of fractures
2) Slice Data: in each of the seven cervical vertebrae (C1-C7). Each element
• For the DICOM slices (e.g., 0.dcm, 1.dcm, 2.dcm, in the array is a binary value (0 or 1), indicating the absence
3.dcm, and so on), we generated .npy images for every or presence of a fracture in the respective vertebra.
even-numbered file (0.dcm, 2.dcm, 4.dcm, 6.dcm,
3) Training and Loss Calculation: During training, we cal- TABLE II: Comparative study with State-of-the-Art models
culate the loss between the predicted array and the ground truth
Model Train Loss Val Loss
labels. This loss is then backpropagated through the network
to update the weights, improving the model’s performance. 2.5D CNN + LSTM classification 0.2117 0.2047
2.5D CNN + Transformer encoder 0.4049 0.4140
• Loss Calculation: The loss function compares the pre- 3D ViT B2 [on 1/25 data] 0.4169 0.4158
dicted classification array to the ground truth labels. We 3D ViT B1 [on 1/25 data] 0.4893 0.4879
have used Binary Cross Entropy loss. EfficientNet-V2-S classifier 0.4812 0.4982
• Backpropagation: The calculated loss is used to adjust
the model’s weights through backpropagation, enabling
the model to learn and improve over time.
VII. C ONCLUSION
This comprehensive use of the Vision Transformer ar- From the above results we can conclude that our model
chitecture allows us to effectively address the classification though trained on 1/25 of the total dataset has performed very
task, providing a robust framework for detecting fractures in efficiently compared to the SOTA architectures. We expect this
cervical vertebrae based on volumetric CT scan data. model to perform much better and beat the SOTA trends in a
Our 3D ViT approach can be summarised in the form of an much more efficient and organized manner.
architectural diagram as shown in Fig. 4.
VIII. ACKNOWLEDGEMENT
The authors would like to acknowledge the opportunity and
guidance provided by Dr. Sneha Singh, Assistant Professor
of the School of Computing and Electrical Engineering, IIT
Mandi. This project has been carried out as a task for the
course: Medical Imaging and Applications (EE-XXX).
R EFERENCES
[1] P.Chład and M.R. Ogiela, ”Deep Learning and Cloud-Based Computa-
tion for Cervical Spine Fracture Detection System”, Electronics 2023,
12, 2056. https://doi.org/10.3390/electronics12092056
[2] M.G. Fehlings and R.G. Perrin, ”The Timing of Surgical Intervention
in the Treatment of Spinal Cord Injury: A Systematic Review of Recent
Clinical Evidence”. Spine 2006, 31, S28–S35
Fig. 4: Architecture of 3D Simple ViT [3] A. Flanders, C. Carr, E. Colak, F. Kitamura, H.M. Lin, J.Rudie,
J.Mongan, K.Andriole, L.Prevedello and M.Riopel, RSNA 2022
Cervical Spine Fracture Detection. 2022. Available online:
https://www.kaggle.com/competitions/rsna-2022-cervicalspine-fracture-
detection/overview (accessed on 5 March 2023).
[4] Cai, Y., Wang, L., Audette, M., Zheng, G. and Li, S. ”Computational
VI. I MPLEMENTATION R ESULTS Methods and Clinical Applications for Spine Imaging”, (Springer, 2020).
[5] Sekuboyina, A.,”Verse: a vertebrae labelling and segmentation bench-
mark”, arXiv preprint arXiv:2001.09193 (2020).
We have implemented 3 models with different hyper- [6] Faruqi, S.,”Vertebral compression fracture after spine stereotactic body
parameters for achieving our objective. The ablation study for radiation therapy: a review of the pathophysiology and risk factors”,
Neurosurgery 83, 314–322 (2018).
them are shown in Table. 1. [7] Balériaux, D. L. and Neugroschl, C. ”Spinal and spinal cord infection”,
Eur. Radiol. Suppl. 14, E72–E83 (2004).
TABLE I: Ablation study of our 3 trained models [8] Li, S., Yao, J.,”Spinal Imaging and Image Analysis” (Springer, 2015).
[9] Zhou, S. K.,”A review of deep learning in medical imaging: Imaging
Model Train Loss Val Loss Train Accuracy Val Accuracy traits, technology trends, case studies with progress highlights, and future
promises”, Proc. IEEE (2021).
3D ViT - 1/3e-4/0.0 0.4893 0.4879 90.2899 90.4176 [10] Zhou, S. K., Rueckert, D. and Fichtinger, ”G. Handbook of medical
3D ViT - 1/5e-4/0.1 0.7179 0.8186 78.6929 75.069 image computing and computer-assisted intervention” (Academic Press,
3D ViT - 1/3e-4/0.1 0.4169 0.4158 93.6419 94.4136
2019).
[11] Lessmann, N., Van Ginneken, B., De Jong, P. A. and Išgum, I, ”Iterative
fully convolutional neural networks for automatic vertebra segmentation
As our objective nearly aligns with a Kaggle competition, and identification”, Med. image analysis 53, 142–155 (2019).
[12] Payer, C., Stern, D., Bischof, H. and Urschler, M., ”Coarse to fine
we have compared our model with the state-of-the-art archi- vertebrae localization and segmentation with spatialconfiguration-net
tectures that have been publicly showcased in the Kaggle and u-net”, In VISIGRAPP (5: VISAPP), 124–133 (2020).
leaderboard of this competition. Our implementation results [13] Yao, J., Burns, J. E., Munoz, H. and Summers, R. M, ”Detection
of vertebral body fractures based on cortical shell unwrapping”, In
against the state-of-the-art Kaggle models, are summarized in International Conference on Medical Image Computing and Computer-
Table. 2. Assisted Intervention, 509–516 (Springer, 2012).
[14] Korez, R., Ibragimov, B., Likar, B., Pernuš, F. and Vrtovec, T. ”A frame-
work for automated spine and vertebrae interpolationbased detection and
model-based segmentation”, IEEE transactions on medical imaging 34,
1649–1662 (2015).
[15] Johnson, C. D., ”Accuracy of ct colonography for detection of large
adenomas and cancers”, New Engl. J. Medicine 359, 1207–1217 (2008).
[16] Bejarano, T., De Ornelas Couto, M. and Mihaylov, ”I. Head-and-neck
squamous cell carcinoma patients with ct taken during pre-treatment,
mid-treatment, and post-treatment dataset”, the cancer imaging archive;
2018.
[17] Simpson, A. L.”A large annotated medical image dataset for the de-
velopment and evaluation of segmentation algorithms”, arXiv preprint
arXiv:1902.09063 (2019).
[18] Harmon S. A.,”Artificial intelligence for the detection of covid-19 pneu-
monia on chest ct using multinational datasets.” Nat. communications
11, 1–7 (2020).
[19] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. and Maier-Hein, ”K. H.
nnu-net: a self-configuring method for deep learning-based biomedical
image segmentation.” Nat. Methods 18, 203–211 (2021).
[20] Yushkevich, P. A., ”User-guided 3D active contour segmentation of
anatomical structures: Significantly improved efficiency and reliability”,
Neuroimage 31, 1116–1128 (2006).
[21] Ronneberger, O., Fischer, P. and Brox, ”T. U-net: Convolutional net-
works for biomedical image segmentation”, In International Confer-
ence on Medical image computing and computer-assisted intervention,
234–241 (Springer, 2015).
[22] Peng, C., Lin, W.-A., Liao, H., Chellappa, R. and Zhou, S. K. Saint:
”spatially aware interpolation network for medical slice synthesis”, In
Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 7750–7759 (2020).
[23] Egger, J, ”Gbm volumetry using the 3d slicer medical image computing
platform”, Sci. reports 3, 1–7 (2013).
[24] National Spinal Cord Injury Statistical Center, ”Spinal Cord Injury
Facts and Figures at a Glance,” Birmingham, AL, USA, 2021. [Online].
Available: https://www.nscisc.uab.edu/Public/Facts