Search Results

You are looking at 1 - 2 of 2 items for

  • Author or Editor: Beverly Walters x
  • By Author: Agee, Bonita S. x
  • By Author: Miller, Joseph H. x
Clear All Modify Search
Restricted access

Christoph J. Griessenauer, Joseph H. Miller, Bonita S. Agee, Winfield S. Fisher III, Joel K. Curé, Philip R. Chapman, Paul M. Foreman, Wilson A. M. Fisher, Adam C. Witcher and Beverly C. Walters


The aim of this study was to examine observer reliability of frequently used arteriovenous malformation (AVM) grading scales, including the 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale, using current imaging modalities in a setting closely resembling routine clinical practice.


Five experienced raters, including 1 vascular neurosurgeon, 2 neuroradiologists, and 2 senior neurosurgical residents independently reviewed 15 MRI studies, 15 CT angiograms, and 15 digital subtraction angiograms obtained at the time of initial diagnosis. Assessments of 5 scans of each imaging modality were repeated for measurement of intrarater reliability. Three months after the initial assessment, raters reassessed those scans where there was disagreement. In this second assessment, raters were asked to justify their rating with comments and illustrations. Generalized kappa (κ) analysis for multiple raters, Kendall's coefficient of concordance (W), and interclass correlation coefficient (ICC) were applied to determine interrater reliability. For intrarater reliability analysis, Cohen's kappa (κ), Kendall's correlation coefficient (tau-b), and ICC were used to assess repeat measurement agreement for each rater.


Interrater reliability for the overall 5-tier Spetzler-Martin scale was fair to good (ICC = 0.69) to extremely strong (Kendall's W = 0.73) on initial assessment and improved on reassessment. Assessment of CT angiograms resulted in the highest agreement, followed by MRI and digital subtraction angiography. Agreement for the overall 3-tier Spetzler-Ponce grade was fair to good (ICC = 0.68) to strong (Kendall's W = 0.70) on initial assessment, improved on reassessment, and was comparable to agreement for the 5-tier Spetzler-Martin scale. Agreement for the overall Pollock-Flickinger radiosurgery-based grade was excellent (ICC = 0.89) to extremely strong (Kendall's W = 0.81). Intrarater reliability for the overall 5-tier Spetzler-Martin grade was excellent (ICC > 0.75) in 3 of the 5 raters and fair to good (ICC > 0.40) in the other 2 raters.


The 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale all showed a high level of agreement. The improved reliability on reassessment was explained by a training effect from the initial assessment and the requirement to defend the rating, which outlines a potential downside for grades determined as part of routine clinical practice to be used for scientific purposes.

Full access

Ross L. Dawkins, Joseph H. Miller, Omar I. Ramadan, Michael C. Lysek, Elizabeth N. Kuhn, Brandon G. Rocque, Michael J. Conklin, R. Shane Tubbs, Beverly C. Walters, Bonita S. Agee and Curtis J. Rozzelle


There are many classification systems for injuries of the thoracolumbar spine. The recent Thoracolumbar Injury Classification and Severity Score (TLICS) has been shown to be a reliable tool for adult patients. The aim of this study was to assess the reliability of the TLICS system in pediatric patients. The validity of the TLICS system is assessed in a companion paper.


The medical records of pediatric patients with acute, traumatic thoracolumbar fractures at a single Level 1 trauma center were retrospectively reviewed. A TLICS was calculated for each patient using CT and MRI, along with the neurological examination recorded in the patient’s medical record. TLICSs were compared with the type of treatment received. Five raters scored all patients separately to assess interrater reliability.


TLICS calculations were completed for 81 patients. The mean patient age was 10.9 years. Girls represented 51.8% of the study population, and 80% of the study patients were white. The most common mechanisms of injury were motor vehicle accidents (60.5%), falls (17.3%), and all-terrain vehicle accidents (8.6%). The mean TLICS was 3.7 ± 2.8. Surgery was the treatment of choice for 33.3% of patients. The agreement between the TLICS-suggested treatment and the actual treatment received was statistically significant (p < 0.0001). The interrater reliability of the TLICS system ranged from moderate to very good, with a Fleiss’ generalized kappa (κ) value of 0.69 for the TLICS treatment suggestion among all patients; however, interrater reliability decreased when MRI was used to contribute to the TLICS. The κ value decreased from 0.73 to 0.57 for patients with CT only vs patients with CT/MRI or MRI only, respectively (p < 0.0001). Furthermore, the agreement between suggested treatment and actual treatment was worse when MRI was used as part of injury assessment.


The TLICS system demonstrates good interrater reliability among physicians assessing thoracolumbar fracture treatment in pediatric patients. Physicians should be cautious when using MRI to aid in the surgical decision-making process.