Thoracolumbar Injury Classification and Severity Score in children: a reliability study

View More View Less
  • 1 Section of Pediatric Neurosurgery, Children’s of Alabama, Department of Neurosurgery, and
  • | 3 Division of Orthopedic Surgery, Department of Surgery, University of Alabama at Birmingham, Alabama;
  • | 2 UT Erlanger Neurosurgery, Chattanooga, Tennessee; and
  • | 4 Seattle Science Foundation, Seattle, Washington
Full access

OBJECTIVE

There are many classification systems for injuries of the thoracolumbar spine. The recent Thoracolumbar Injury Classification and Severity Score (TLICS) has been shown to be a reliable tool for adult patients. The aim of this study was to assess the reliability of the TLICS system in pediatric patients. The validity of the TLICS system is assessed in a companion paper.

METHODS

The medical records of pediatric patients with acute, traumatic thoracolumbar fractures at a single Level 1 trauma center were retrospectively reviewed. A TLICS was calculated for each patient using CT and MRI, along with the neurological examination recorded in the patient’s medical record. TLICSs were compared with the type of treatment received. Five raters scored all patients separately to assess interrater reliability.

RESULTS

TLICS calculations were completed for 81 patients. The mean patient age was 10.9 years. Girls represented 51.8% of the study population, and 80% of the study patients were white. The most common mechanisms of injury were motor vehicle accidents (60.5%), falls (17.3%), and all-terrain vehicle accidents (8.6%). The mean TLICS was 3.7 ± 2.8. Surgery was the treatment of choice for 33.3% of patients. The agreement between the TLICS-suggested treatment and the actual treatment received was statistically significant (p < 0.0001). The interrater reliability of the TLICS system ranged from moderate to very good, with a Fleiss’ generalized kappa (κ) value of 0.69 for the TLICS treatment suggestion among all patients; however, interrater reliability decreased when MRI was used to contribute to the TLICS. The κ value decreased from 0.73 to 0.57 for patients with CT only vs patients with CT/MRI or MRI only, respectively (p < 0.0001). Furthermore, the agreement between suggested treatment and actual treatment was worse when MRI was used as part of injury assessment.

CONCLUSIONS

The TLICS system demonstrates good interrater reliability among physicians assessing thoracolumbar fracture treatment in pediatric patients. Physicians should be cautious when using MRI to aid in the surgical decision-making process.

ABBREVIATIONS

CI = confidence interval; ICC = intraclass correlation coefficient; MVA = motor vehicle accident; OR = odds ratio; PLC = posterior ligamentous complex; TLICS = Thoracolumbar Injury Classification and Severity Score.

OBJECTIVE

There are many classification systems for injuries of the thoracolumbar spine. The recent Thoracolumbar Injury Classification and Severity Score (TLICS) has been shown to be a reliable tool for adult patients. The aim of this study was to assess the reliability of the TLICS system in pediatric patients. The validity of the TLICS system is assessed in a companion paper.

METHODS

The medical records of pediatric patients with acute, traumatic thoracolumbar fractures at a single Level 1 trauma center were retrospectively reviewed. A TLICS was calculated for each patient using CT and MRI, along with the neurological examination recorded in the patient’s medical record. TLICSs were compared with the type of treatment received. Five raters scored all patients separately to assess interrater reliability.

RESULTS

TLICS calculations were completed for 81 patients. The mean patient age was 10.9 years. Girls represented 51.8% of the study population, and 80% of the study patients were white. The most common mechanisms of injury were motor vehicle accidents (60.5%), falls (17.3%), and all-terrain vehicle accidents (8.6%). The mean TLICS was 3.7 ± 2.8. Surgery was the treatment of choice for 33.3% of patients. The agreement between the TLICS-suggested treatment and the actual treatment received was statistically significant (p < 0.0001). The interrater reliability of the TLICS system ranged from moderate to very good, with a Fleiss’ generalized kappa (κ) value of 0.69 for the TLICS treatment suggestion among all patients; however, interrater reliability decreased when MRI was used to contribute to the TLICS. The κ value decreased from 0.73 to 0.57 for patients with CT only vs patients with CT/MRI or MRI only, respectively (p < 0.0001). Furthermore, the agreement between suggested treatment and actual treatment was worse when MRI was used as part of injury assessment.

CONCLUSIONS

The TLICS system demonstrates good interrater reliability among physicians assessing thoracolumbar fracture treatment in pediatric patients. Physicians should be cautious when using MRI to aid in the surgical decision-making process.

Pediatric spine injuries account for 1%–10% of all spine traumas and represent 5% of all pediatric bone fractures.2,3,5,32 The reported incidence of thoracolumbar fractures in children with spine injuries ranges from 5% to 34%.5,20

Current management strategies for pediatric thoracolumbar fractures rely mostly on the discretion of the treating physician. There is no universally accepted classification system to aid in the decision for fracture management. In 2005, the Thoracolumbar Injury Classification and Severity Score (TLICS) was developed with the intention of providing a clinically relevant classification system that would be not only descriptive, but also predictive of outcome and helpful for guidance of treatment decisions.15,33 To date, only 2 studies assessing the use of the TLICS system in children have been published.27,29 Only 1 study considered the reliability of the scoring system in addition to its validity.27 The purpose of our study was to evaluate the interrater reliability of the TLICS system in a large cohort of pediatric patients. We assess the validity of the TLICS system in a companion paper.

Methods

Study Population

A retrospective review of all medical records from 2002 to 2013 that met study criteria was completed for pediatric patients (age 18 years and younger) with acute, traumatic thoracolumbar fractures at a Level 1 trauma center (Children’s of Alabama). Inclusion in the study required the presence of CT and/or MRI at the time of fracture assessment, in addition to a documented neurological examination. Exclusion criteria included pathological fractures, age-indeterminate fractures with no recent traumatic event at the time of assessment, incomplete clinical records, and evidence of use of the TLICS system during patient evaluation. Patient charts were reviewed for evidence of use of the TLICS system during the initial patient evaluation and subsequent care. Of note, traumatic spine injuries at our institution are typically managed by the neurosurgery service. Prior to the initiation of this study, the TLICS system was unfamiliar to the neurosurgery faculty. Treatment decisions were made using best clinical judgment for each case. Obviously, clinical judgment may have taken into account some of the TLICS parameters, but not as a formalized scoring system. However, it was not confirmed that previous faculty involved in the care of study patients were also unfamiliar with the TLICS system at the time of patient evaluations, although we believe it is safe to assume this is true. A total of 81 patients met the criteria for inclusion. Approval was obtained from the hospital’s IRB prior to review of the medical records and data collection.

Data Collection

A TLICS was calculated for each patient (Table 1). Neurosurgeons and an orthopedic surgeon at our institution completed the scoring. CT and/or MRI were used to assign the morphology and posterior ligamentous complex (PLC) components of the score. The PLC components include the ligamentum flavum, facet joint capsules, interspinous ligament, and supraspinous ligament. Imaging findings on CT that are considered indicative of PLC disruption include diastasis of the facet joints and widening of the interspinous space. For MRI, these same imaging findings are considered pertinent. In addition, high signal intensity on T2-weighted or short tau inversion recovery sequences in the region of the PLC components are considered to be suggestive of PLC disruption, and are not seen on CT. The neurological examination at the time of initial assessment, as recorded in the patient’s medical record, was used for the neurological status component of the score. A total score was obtained by summing the scores of the 3 components. According to the first published description of the TLICS system, the suggested treatment is conservative therapy for a score of 1–3, either surgery or conservative therapy for a score of 4, and surgery for a score of 5 or greater.33

TABLE 1.

TLICS scoring

ParameterPoints
Morphology
 Compression fracture1
 Burst fracture2
 Translational/rotational3
 Distraction4
Neurological involvement
 Intact0
 Nerve root2
Cord, conus medullaris
 Incomplete3
 Complete2
 Cauda equina3
PLC
 Intact0
 Injury suspected/indeterminate2
 Injured3

Reproduced from Lee JY et al: J Orthop Sci 10:671–675, 2005. Published with permission. CC BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/).

The treatment received by each patient was classified as either surgical or conservative for comparison with the calculated TLICS scores and the treatment recommendation. Any surgery for decompression and/or mechanical stabilization involving the fracture site was classified as surgical treatment. Conservative treatments included use of an external orthotic brace with scheduled follow-up, no brace with scheduled follow-up, and no scheduled follow-up. Patients with failure of conservative treatment requiring delayed surgery were kept in the conservative treatment group for analysis of intention to treat. Demographic, mechanism of injury, and outcome data were also collected for each patient.

To assess interrater reliability, 5 physicians independently assigned a TLICS score for each patient. The physicians included 1 junior neurosurgery resident, 2 senior neurosurgery residents, 1 neurosurgery faculty member with 2 years of postfellowship experience, and 1 orthopedic faculty member with 23 years of postfellowship experience, all from the University of Alabama Birmingham/Children’s of Alabama at the time of completion of the ratings. Raters were provided with the medical record number of each patient in addition to the date of the CT and/or MRI to be reviewed. Images were reviewed in our institution’s image viewing system, which can be accessed separately from the other portions of the patient’s medical record. Images obtained at the time of the initial patient assessment were selected for scoring of the fracture morphology and PLC disruption components of the TLICS. The neurological examination for each patient at the time of initial assessment was provided to the raters for scoring of the neurological status component of the TLICS. No other medical record data, including treatment or outcome information, was provided to the raters during assignment of the TLICS scores. Furthermore, none of the raters were involved in the treatment of any of the study patients.

Statistical Analysis

Differences between patients receiving conservative and surgical treatments were analyzed for categorical demographic, mechanism of injury, and TLICS subcategory score variables using a chi-square test (or Fisher’s exact test when assumptions were not met). Age and TLICS total score comparisons between patients receiving conservative and surgical treatments were tested as continuous variables using an independent t-test and a Mann-Whitney U-test where appropriate. For comparisons of the actual treatment to the TLICS suggested treatment, patients with a TLICS of 4 were excluded. A score of 4 is classified as neutral, so either surgical or conservative treatment is considered reasonable. Odds ratios (ORs) with 95% confidence intervals (95% CI) were calculated for the odds of actual treatment given the TLICS suggested treatment for different scenarios.

For interrater reliability analysis using 5 raters, Fleiss’ generalized kappa (κ), Kendall’s coefficient of concordance (W), and the intraclass correlation coefficient (ICC) were used to assess agreement using the TLICS scoring system. The nominal variable for TLICS treatment suggestion was assessed using only Fleiss’ generalized κ for multiple raters because this parameter is considered appropriate for nominal data.6,7 The TLICS parameters (morphology, PLC, neurological status) were treated as ordinal data; Kendall’s W is an interrater reliability coefficient particularly suited for the ordinal nature of the TLICS parameters.10 TLICS total scores were analyzed as continuous data.

Generalized κ estimates between 0.81 and 1.00 are interpreted as almost perfect agreement, κ values between 0.61 and 0.80 as substantial, κ values between 0.41 and 0.60 as moderate, κ values between 0.21 and 0.40 as fair, and κ values 0.20 and lower as poor.13

Extremely strong interrater reliability is inferred for Kendall’s W coefficients between 0.71 and 0.90, strong reliability is inferred for coefficients between 0.51 and 0.70, moderate reliability is inferred for coefficients from 0.31 to 5.0, and less than 0.30 is interpreted as weak interrater reliability.10

Using the Shrout and Fleiss ICC Model 2 (ICC2,1) analyzed for absolute agreement, ICC estimates greater than 0.75 are interpreted as excellent agreement, ICC estimates between 0.40 and 0.75 as fair to good agreement, and ICC estimates less than 0.40 as poor agreement.1,19,31

Results

Demographics and Clinical Characteristics

Of the 81 study patients, 54 (66.7%) were treated conservatively and 27 (33.3%) were treated surgically (Table 2). The mean patient age was 10.9 years. Girls comprised 51.8% of the patients, and 80.2% of all patients were white. Motor vehicle accident (MVA) was the most common mechanism of injury. Patients who received surgical treatment were significantly older than those who received conservative treatment (12.1 vs 10.2 years, respectively; p = 0.04). There were no significant differences in sex, race, or mechanism of injury between the conservative and surgical treatment groups.

TABLE 2.

Comparison of patient demographics and mechanisms of injury by treatment received

VariableTotal CohortConservative TreatmentSurgical Treatmentp Value*
No. of patients815427
Age in yrs
 Mean ± SD10.9 ± 4.310.2 ± 4.312.1 ± 4.10.04
 Median (range)12.0 (1.0–17.0)10.5 (1.0–17.0)13.0 (1.0–17.0)0.14
Sex0.64
 Female42 (51.8)29 (53.7)13 (48.2)
 Male39 (48.2)25 (46.3)14 (51.8)
Race0.69
 White65 (80.2)44 (81.5)21 (77.8)
 Black16 (19.8)10 (18.5)6 (22.2)
Mechanism of injury0.26
 MVA49 (60.5)31 (57.4)18 (66.7)
 Fall14 (17.3)12 (22.2)2 (7.4)
 Sport/rec event2 (2.5)2 (3.7)0 (0.0)
 ATV accident7 (8.6)4 (7.4)3 (11.1)
 MCC4 (4.9)1 (1.9)3 (11.1)
 Crush3 (3.7)2 (3.7)1 (3.7)
 Ped vs auto2 (2.5)2 (3.7)0 (0.0)

ATV = all-terrain vehicle; MCC = motorcycle collision; ped vs auto = pedestrian struck by automobile; sport/rec = sporting or recreational event.

All data given as number of patients (%) unless otherwise indicated.

Bivariate analysis; p < 0.05 is considered statistically significant. Statistical significance for differences in means were tested using independent t-tests and Mann-Whitney U-tests; statistical significance for differences in proportions were tested using chi-square and Fisher’s exact tests. Boldface type indicates statistical significance.

The mean number of follow-up visits was 2.9 for the conservative group and 3.9 for the surgical group. The mean length of follow-up was 298 days for the conservative group and 455 days for the surgical group. Of those in the conservative group, 66.7% were treated with bracing. Conservative therapy was unsuccessful in 1 patient (1.2%), who required surgical intervention. This patient had a lumbar Chance fracture and developed progressive focal kyphosis and instability 18 months after the injury, requiring dorsal internal fixation and fusion. Of those in the surgical group, 2 patients (7.4%) had anterior approaches and 25 (92.6%) had posterior approaches. All posterior approaches included internal fixation. Four patients (14.8%) required second operations, and all had undergone posterior approaches. Two of these patients, having suffered significant spinal cord injuries at the time of their accidents, developed surgical site infections requiring reoperation. One patient developed progressive junctional kyphosis and scoliosis, requiring extension of the fusion construct 7 months after the initial surgery. The fourth patient had a postoperative radiculopathy requiring a second operation for screw revision 5 days after the initial surgery.

TLICS

Using the TLICSs assigned by one of the authors (R.L.D.), the mean TLICS among all patients was 3.7 ± 2.8. Most patients were scored as having had a fracture with compressive morphology (43.2%), no evidence of injury to the PLC (56.8%), and no associated neurological deficits on examination (88.9%). However, patients in the surgical group had a significantly higher mean TLICS (6.0 vs 2.5, p < 0.0001). In addition, they were significantly more likely to have received higher scores for each individual component of the TLICS (Table 3).

TABLE 3.

Breakdown of TLICSs by treatment received

ScoreTotal CohortConservative TreatmentSurgical Treatmentp Value*
Total TLICS
 Mean ± SD3.7 ± 2.82.5 ± 2.26.0 ± 2.3<0.0001
 Median (range)2.0 (1.0–9.0)1.0 (1.0–7.0)7.0 (2.0–9.0)<0.0001
 Mode1.01.07.0
Morphology<0.0001
 135 (43.2)34 (63.0)1 (3.7)
 218 (22.2)10 (18.5)8 (29.6)
 35 (6.2)2 (3.7)3 (11.1)
 423 (28.4)8 (14.8)15 (55.6)
PLC<0.0001
 046 (56.8)39 (72.2)7 (25.9)
 26 (7.4)5 (9.3)1 (3.7)
 329 (35.8)10 (18.5)19 (70.4)
Neurological status0.0002
 072 (88.9)53 (98.2)19 (70.4)
 26 (7.4)0 (0)6 (22.2)
 33 (3.7)1 (1.8)2 (7.4)

All data given as number of patients (%) unless otherwise indicated.

Bivariate analysis; p < 0.05 is considered statistically significant. For the categorical variables morphology, PLC, and neurological status, statistical significance was assessed using chi-square and Fisher’s exact tests. The continuous variable total TLICS was tested for statistical significance using the Mann-Whitney U-test. Boldface type indicates statistical significance.

For comparisons between actual treatment and TLICS-suggested treatment, 7 patients with a neutral score of 4 were excluded from analyses. The treatment suggested by the TLICS scoring system matched the actual treatment in 79.7% of cases. Patients who received surgical treatment were more likely to have TLICSs suggestive of surgical treatment (OR 17.3, 95% CI 4.9–61.4). The agreement between actual treatment and suggested treatment was statistically significant (p < 0.0001; Table 4).

TABLE 4.

Comparison of TLICS-suggested treatment to actual treatment received

TreatmentActual TreatmentOR (95% CI)p Value*
ConservativeSurgical 
TLICS suggested17.3 (4.9–61.4)<0.0001
 Conservative40 (78.4)4 (17.4)
 Surgical11 (21.6)19 (82.6)

Patients with a neutral TLICS of 4 excluded (n = 74 in this comparison). All data given as number of patients (%) unless otherwise indicated.

p < 0.05 is considered statistically significant; statistical significance for differences in proportions were tested using chi-square and Fisher’s exact tests. Boldface type indicates statistical significance.

Interrater Reliability

For the assessment of interrater reliability, all 3 statistical methods were used for each parameter in the TLICS and the total TLICS, and Fleiss’s generalized κ was used for the TLICS treatment suggestion nominal variable. All values ranged from moderate to very good. Consideration was given to the effect of imaging modality (CT and MRI) on agreement. Concerning the type of imaging reviewed for each patient, 42 patients had CT only, 35 patients had CT and MRI, and 4 patients had MRI only. For those patients who had MRI only, a sample size of 4 is small and therefore not very robust for detecting differences in variance. In fact, it was impossible to detect the ICC for some parameters due to no variance across the raters (see Supplemental Table 1d). For this reason, the patients with MRI only were grouped together with the patients who had both CT and MRI to assess how the use of MRI affected interrater reliability when compared with the use of CT only. Overall, interrater reliability was worse when MRI was employed and used in calculating the TLICS. Using Kendall’s coefficient of concordance (W) and the ICC, the interrater reliability for the total TLICS was extremely strong (W = 0.87) and excellent (ICC = 0.83) for all patients, extremely strong (W = 0.80) and excellent (ICC = 0.88) for patients with CT only, and extremely strong (W = 0.79) and fair to good (ICC = 0.70) for patients with MRI. Using Fleiss’s generalized κ the interrater reliability for the TLICS treatment suggestion was good (κ = 0.69) for all patients, good (κ = 0.73) for patients with CT only, and moderate (κ = 0.57) for patients with MRI (Table 5; see Supplemental Tables 1a–c for reliability statistics on each component of the TLICS score, separated by imaging type). When interrater reliability estimates by the faculty and residents were compared, reliability was slightly higher for faculty estimates than for resident estimates (κ = 0.77 vs 0.64, respectively, for TLICS treatment suggestion; Supplemental Table 1e).

TABLE 5.

TLICS interrater reliability estimates by imaging modality

ParameterAll PatientsCT Only, n = 42CT/MRI or MRI Only, n = 39
κWICCκWICCκWICC 
Morphology0.600.730.670.530.760.750.560.640.51
PLC0.600.820.760.610.790.770.490.750.67
Neurological status0.870.970.960.870.990.980.880.950.94
Total TLICS0.500.870.830.460.800.880.440.790.70
TLICS treatment suggestion0.690.730.57

Three separate statistical estimates of interrater reliability are presented for the application of the TLICS scoring system to patients at Children’s of Alabama by 5 raters. All p values presented for κ, W, and ICC probabilities are < 0.0001.

Effect of Imaging Modality

The effect of imaging modality on agreement between the actual treatments received and the treatments suggested by the TLICS system was also analyzed. As noted previously, the agreement was statistically significant when all patients with nonneutral scores were considered (Table 4). The statistical significance was maintained for patients who had only CT scans (OR 46.5, 95% CI 5.4–397.6, p = 0.0002), but the agreement between actual treatment and suggested treatment was not as strong when an MRI scan was used to assess injury (OR 6.5, 95% CI 1.1–37.5, p = 0.03; Table 6). When only patients who were treated surgically were considered, the presence of an MRI did not affect the suggested treatment (p = 0.59; Table 7). However, when only patients who were treated conservatively were considered, MRI made it more likely for TLICS to suggest surgical treatment (p = 0.0003; Table 7).

TABLE 6.

Actual treatment versus TLICS recommendation by imaging modality

TreatmentCT Only, n = 41CT/MRI or MRI Only, n = 33
Conservative TreatmentSurgical TreatmentOR (95% CI)p Value*Conservative TreatmentSurgical TreatmentOR (95% CI)p Value* 
TLICS suggested46.5 (5.4–397.6)0.00026.5 (1.1–37.5)0.03
 Conservative31 (93.9)2 (25.0)9 (50.0)2 (13.3)
 Surgical2 (6.1)6 (75.0)9 (50.0)13 (86.7)

Patients with a neutral TLICS score of 4 excluded. All data given as number of patients (%) unless otherwise indicated.

p < 0.05 is considered statistically significant; statistical significance for differences in proportions were tested using chi-square and Fisher’s exact tests. Boldface type indicates statistical significance.

TABLE 7.

TLICS score recommendation by imaging modality for surgically and conservatively treated patients

TreatmentCT OnlyCT/MRI or MRI OnlyOR (95%CI)p Value*
Surgical
 No. of patients815
 TLICS suggested2.2 (0.2–19.3)0.59
  Conservative2 (25.0)2 (13.3)
  Surgical6 (75.0)13 (86.7)
Conservative
 No. of patients3318
 TLICS suggested15.5 (2.8–85.1)0.0003
  Conservative31 (93.9)9 (50.0)
  Surgical2 (6.1)9 (50.0)

Patients with a neutral TLICS score of 4 excluded. All data given as number of patients (%) unless otherwise indicated.

p < 0.05 is considered statistically significant; statistical significance for differences in proportions were tested using Fisher’s exact test. Boldface type indicates statistical significance.

Discussion

Among spine surgeons, no single classification system has been adopted as the optimal scale for evaluating thoracolumbar fractures. The Denis and Magerl-AO classifications are 2 commonly used scales. Each classifies fractures on the basis of morphology and mechanism of injury.4,18 However, neither scale emphasizes the importance of neurological status or provides a reliable definition of mechanical stability. Furthermore, studies have demonstrated poor to moderate inter- and intrarater reliability for both systems.16,30,35

The recently introduced TLICS classification system was developed to provide a clinically relevant scale that could stratify injury severity and guide treatment decisions.33 Because the majority of spine trauma occurs in the adult population,5 it stands to reason that the TLICS system, like its predecessors, was developed for use in adult thoracolumbar fractures. Since its introduction in 2005, numerous studies have demonstrated good reliability of this system in samples of adult patients.12,17,23,24 However, the reliability of the TLICS in adults cannot be extrapolated to children, for multiple reasons. First, the pediatric spinal column is anatomically and biomechanically quite different from the adult spinal column, most notably in prepubescent children; its increased water and cartilage contents provide for greater ligamentous flexibility and elasticity.2,3,25,32 The facet joints are shallower and more horizontal, making the pediatric spinal column tolerant of translational and flexion/extension forces.3,32 Thus, flexion/distraction and rotational injuries are more often seen in adolescents, whereas younger children present with compression injuries.32 Furthermore, the pediatric spinal column is exposed to a different array of traumatic events. While mechanisms of injury vary among pediatric age groups, MVAs are generally the most common. Other common mechanisms include falls in younger children and higher-risk activities such as all-terrain vehicles, motorcycles, and athletic activities in older children.2,3,9,25,32 The distribution of mechanisms of injury in our study was similar to distributions reported in previously published works, with most injuries resulting from MVAs.

Clearly, pediatric thoracolumbar fractures are a distinct pathology and their evaluation using the TLICS system must be assessed independently. Two studies to date have applied the TLICS system to a pediatric patient population. Savage et al. included 20 spine surgeons who each scored 20 pediatric thoracolumbar fractures. They found moderate interrater reliability for the total TLICS score, with greater agreement for operative versus nonoperative treatment recommendations based on the TLICS.27 However, the unweighted Cohen coefficient used in the study by Savage et al. is not as robust a statistic as Fleiss’s generalized κ for multiple raters, Kendall’s W coefficient for ordinal data, or the ICC for ordered, Likert-type, or continuous data. Furthermore, a potential conflict of interest stems from the inclusion of the original authors of the TLICS system, which is the case in numerous studies assessing TLICS in adults.8,15,21–24,26 Sellin et al. published the second study on TLICS in children,29 but they did not discuss or assess the reliability of the TLICS in pediatrics, and made no reference to the only reported reliability statistics on TLICS in pediatrics as found in the study by Savage et al. This is of consequence because reliability is a necessary condition for validity.11 Furthermore, the raters in this study were not blinded to treatment.

Our objective was to assess the reliability of the TLICS scoring system in a large cohort of pediatric patients evaluated at a Level 1 trauma center for traumatic thoracolumbar fractures. The interrater reliability of the TLICS system was assessed as good to excellent for the total TLICS, and moderate to very good for the TLICS treatment suggestion, depending on the choice of statistical method.

We hypothesized that MRI would be useful for clarifying the presence of disruption of the PLC when this was not clear on CT, thereby increasing the interrater reliability. It must be noted that the TLICS system was developed for use with CT. By including MRI, we applied the system in a manner that was not originally intended. However, MRI was used in nearly half of our patients, and excluding those patients would have significantly decreased the size of our study population. Furthermore, by including MRI, we discovered an interesting association between this imaging technique and the reliability of the TLICS system in pediatrics. The interrater reliability was actually worse among patients who underwent MRI. Thus, our hypothesis was false. While we are not sure how to explain this finding, it is likely that the raters assessed imaging parameters differently on the MRI of a patient who had a negative or equivocal PLC score on CT, so some raters increased the PLC score and some kept it the same. Also, it is likely that MRI was more often obtained in cases in which the surgical decision-making process was not clear from CT alone. This would result in an association between the decreased reliability of the TLICS system and the use of MRI. The negative effect of MRI on the reliability of PLC assessment is actually supported in the literature, which demonstrates that MRI has only poor to moderate reliability when the integrity of the PLC is assessed in adults.14,28

Given the interesting effect of imaging modality on interrater reliability, we surmised that MRI might also affect the significance of the agreement between the actual treatment received and the treatment suggested by the TLICS system. As noted previously, we found that the agreement remained significant when patients underwent CT only, but was not as significant when they had MRI. We gained insight into this relationship by looking at the association of imaging with conservatively treated versus surgically treated patients. Here we found that patients who were treated conservatively were more likely to have a TLICS suggesting surgical treatment when MRI was used to calculate the score. This tells us that MRI negatively affects the agreement between actual and suggested treatment by making conservatively treated patients more likely to receive a suggested treatment of surgery.

One study in adults demonstrated that MRI has poor specificity for identifying PLC disruption in thoracolumbar injuries.34 Assuming this holds true for our study population, the most likely explanation for our findings is that MRI made it more likely for a rater to give a false-positive rating for disruption of the PLC in a patient who went on to be treated conservatively. This caused a decrease in the agreement between actual treatment and suggested treatment when MRI was used. Taking this into account, along with the knowledge that MRI worsens interrater reliability of the TLICS system, surgeons must exercise caution when MRI is used to assist in the treatment decision, as it could sway the decision in the direction of surgical treatment in a patient who would be likely to tolerate conservative treatment.

Limitations

Our study is naturally limited by its retrospective nature. The only objective measurement of poor outcome was the requirement for repeat surgery in the surgical group, or a first-time surgery for those who were treated unsuccessfully with conservative therapy. There were no objective quality of life outcomes for analysis. The retrospective chart review could not prove that each patient was treated without use of the TLICS system. However, the concept of the TLICS system was unfamiliar to the current neurosurgery and orthopedic faculty prior to initiation of this study. Thus, although a potential limitation, we believe it is safe to say that a TLICS was not used for treatment decisions in the study patients. Another study limitation is that we were unable to have a study population with a similar imaging evaluation. There were 4 patients without CT scans, which is the gold standard for evaluation of fracture morphology. Also, scoring systems with neutral scores present an analytical dilemma, given that the statistical results can be swayed by how these scores are considered. Therefore, we chose to exclude TLICSs of 4 for comparisons of actual treatment to suggested treatment.

Conclusions

The TLICS has proven to be a reliable tool for assessing adult traumatic thoracolumbar fractures. However, its reliability has not been adequately assessed in a pediatric patient population. This study demonstrates that the TLICS system has good interrater reliability when applied to children. However, surgeons should not overestimate the value of MRI when making surgical decisions, as it decreases the reliability of the TLICS system.

Acknowledgments

Joseph H. Miller, MD, completed work on this study as a Stephens Scholar. Elizabeth N. Kuhn, MD, completed work on this study as a Kaul Foundation Clinical Research Scholar.

Disclosures

The authors report no conflict of interest concerning the materials or methods used in this study or the findings specified in this paper.

Author Contributions

Conception and design: Dawkins, Miller, Tubbs, Walters, Rozzelle. Acquisition of data: Dawkins, Miller, Ramadan, Lysek, Kuhn, Rocque, Conklin. Analysis and interpretation of data: Dawkins, Miller, Ramadan, Kuhn, Rocque, Conklin, Walters, Agee, Rozzelle. Drafting the article: Dawkins, Miller, Agee. Critically revising the article: Dawkins, Miller, Ramadan, Kuhn, Rocque, Conklin, Walters, Agee, Rozzelle. Reviewed submitted version of manuscript: all authors. Approved the final version of the manuscript on behalf of all authors: Dawkins. Statistical analysis: Dawkins, Miller, Agee. Administrative/technical/material support: Dawkins, Miller, Tubbs, Walters, Rozzelle. Study supervision: Dawkins, Miller, Walters, Rozzelle.

Supplemental Information

Online-Only Content

Supplemental material is available with the online version of the article.

References

  • 1

    Cicchetti DV: Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 6:284290, 1994

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 2

    Cirak B, Ziegfeld S, Knight VM, Chang D, Avellino AM, Paidas CN: Spinal injuries in children. J Pediatr Surg 39:607612, 2004

  • 3

    Daniels AH, Sobel AD, Eberson CP: Pediatric thoracolumbar spine trauma. J Am Acad Orthop Surg 21:707716, 2013

  • 4

    Denis F: The three column spine and its significance in the classification of acute thoracolumbar spinal injuries. Spine (Phila Pa 1976) 8:817831, 1983

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 5

    Dogan S, Safavi-Abbasi S, Theodore N, Chang SW, Horn EM, Mariwalla NR, et al.: Thoracolumbar and sacral spinal injuries in children and adolescents: a review of 89 cases. J Neurosurg 106 (6 Suppl):426433, 2007

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 6

    Fleiss JL: Measuring nominal scale agreement among many raters. Psychol Bull 76:378382, 1971

  • 7

    Fleiss JL: Statistical Methods for Rates and Proportions, ed 2. New York: Wiley, 1981

  • 8

    Joaquim AF, Lawrence B, Daubs M, Brodke D, Tedeschi H, Vaccaro AR, et al.: Measuring the impact of the Thoracolumbar Injury Classification and Severity Score among 458 consecutively treated patients. J Spinal Cord Med 37:101106, 2014

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 9

    Junkins EP Jr, Stotts A, Santiago R, Guenther E: The clinical presentation of pediatric thoracolumbar fractures: a prospective study. J Trauma 65:10661071, 2008

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 10

    Kendall M, Gibbons J: Rank Correlation Methods, ed 5. London: Edward Arnold, 1990

  • 11

    Kimberlin CL, Winterstein AG: Validity and reliability of measurement instruments used in research. Am J Health Syst Pharm 65:22762284, 2008

  • 12

    Koh YD, Kim DJ, Koh YW: Reliability and validity of Thoracolumbar Injury Classification and Severity Score (TLICS). Asian Spine J 4:109117, 2010

  • 13

    Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 33:159174, 1977

  • 14

    Lee GY, Lee JW, Choi SW, Lim HJ, Sun HY, Kang Y, et al.: MRI inter-reader and intra-reader reliabilities for assessing injury morphology and posterior ligamentous complex integrity of the spine according to the thoracolumbar injury classification system and severity score. Korean J Radiol 16:889898, 2015

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 15

    Lee JY, Vaccaro AR, Lim MR, Öner FC, Hulbert RJ, Hedlund R, et al.: Thoracolumbar injury classification and severity score: a new paradigm for the treatment of thoracolumbar spine trauma. J Orthop Sci 10:671675, 2005

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 16

    Lenarz CJ, Place HM, Lenke LG, Alander DH, Oliver D: Comparative reliability of 3 thoracolumbar fracture classification systems. J Spinal Disord Tech 22:422427, 2009

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 17

    Lewkonia P, Paolucci EO, Thomas K: Reliability of the thoracolumbar injury classification and severity score and comparison with the Denis classification for injury to the thoracic and lumbar spine. Spine (Phila Pa 1976) 37:21612167, 2012

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 18

    Magerl F, Aebi M, Gertzbein SD, Harms J, Nazarian S: A comprehensive classification of thoracic and lumbar injuries. Eur Spine J 3:184201, 1994

  • 19

    McGraw KO, Wong SP: Forming inferences about some intraclass correlation coefficients. Psychol Methods 1:3046, 1996

  • 20

    Parent S, Dimar J, Dekutoski M, Roy-Beaudry M: Unique features of pediatric spinal cord injury. Spine (Phila Pa 1976) 35 (21 Suppl):S202S208, 2010

    • Search Google Scholar
    • Export Citation
  • 21

    Patel AA, Dailey A, Brodke DS, Daubs M, Harrop J, Whang PG, et al.: Thoracolumbar spine trauma classification: the Thoracolumbar Injury Classification and Severity Score system and case examples. J Neurosurg Spine 10:201206, 2009

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 22

    Patel AA, Vaccaro AR: Thoracolumbar spine trauma classification. J Am Acad Orthop Surg 18:6371, 2010

  • 23

    Patel AA, Whang PG, Brodke DS, Agarwal A, Hong J, Fernandez C, et al.: Evaluation of two novel thoracolumbar trauma classification systems. Indian J Orthop 41:322326, 2007

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 24

    Raja Rampersaud Y, Fisher C, Wilsey J, Arnold P, Anand N, Bono CM, et al.: Agreement between orthopedic surgeons and neurosurgeons regarding a new algorithm for the treatment of thoracolumbar injuries: a multicenter reliability study. J Spinal Disord Tech 19:477482, 2006

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 25

    Reilly CW: Pediatric spine trauma. J Bone Joint Surg Am 89 (Suppl 1):98107, 2007

  • 26

    Rihn JA, Anderson DT, Harris E, Lawrence J, Jonsson H, Wilsey J, et al.: A review of the TLICS system: a novel, user-friendly thoracolumbar trauma classification system. Acta Orthop 79:461466, 2008

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 27

    Savage JW, Moore TA, Arnold PM, Thakur N, Hsu WK, Patel AA, et al.: The reliability and validity of the thoracolumbar injury classification system in pediatric spine trauma. Spine (Phila Pa 1976) 40:E1014E1018, 2015

    • Search Google Scholar
    • Export Citation
  • 28

    Schweitzer KM, Vaccaro AR, Harrop JS, Hurlbert J, Carrino JA, Rechtine GR, et al.: Interrater reliability of identifying indicators of posterior ligamentous complex disruption when plain films are indeterminate in thoracolumbar injuries. J Orthop Sci 12:437442, 2007

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 29

    Sellin JN, Steele WJ III, Simpson L, Huff WX, Lane BC, Chern JJ, et al.: Multicenter retrospective evaluation of the validity of the Thoracolumbar Injury Classification and Severity Score system in children. J Neurosurg Pediatr 18:164170, 2016

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 30

    Sethi MK, Schoenfeld AJ, Bono CM, Harris MB: The evolution of thoracolumbar injury classification systems. Spine J 9:780788, 2009

  • 31

    Shrout PE, Fleiss JL: Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86:420428, 1979

  • 32

    Slotkin JR, Lu Y, Wood KB: Thoracolumbar spinal trauma in children. Neurosurg Clin N Am 18:621630, 2007

  • 33

    Vaccaro AR, Lehman RA Jr, Hurlbert RJ, Anderson PA, Harris M, Hedlund R, et al.: A new classification of thoracolumbar injuries: the importance of injury morphology, the integrity of the posterior ligamentous complex, and neurologic status. Spine (Phila Pa 1976) 30:23252333, 2005

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 34

    Vaccaro AR, Rihn JA, Saravanja D, Anderson DG, Hilibrand AS, Albert TJ, et al.: Injury of the posterior ligamentous complex of the thoracolumbar spine: a prospective evaluation of the diagnostic accuracy of magnetic resonance imaging. Spine (Phila Pa 1976) 34:E841E847, 2009

    • Search Google Scholar
    • Export Citation
  • 35

    Wood KB, Khanna G, Vaccaro AR, Arnold PM, Harris MB, Mehbod AA: Assessment of two thoracolumbar fracture classification systems as used by multiple surgeons. J Bone Joint Surg Am 87:14231429, 2005

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 1

    Cicchetti DV: Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 6:284290, 1994

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 2

    Cirak B, Ziegfeld S, Knight VM, Chang D, Avellino AM, Paidas CN: Spinal injuries in children. J Pediatr Surg 39:607612, 2004

  • 3

    Daniels AH, Sobel AD, Eberson CP: Pediatric thoracolumbar spine trauma. J Am Acad Orthop Surg 21:707716, 2013

  • 4

    Denis F: The three column spine and its significance in the classification of acute thoracolumbar spinal injuries. Spine (Phila Pa 1976) 8:817831, 1983

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 5

    Dogan S, Safavi-Abbasi S, Theodore N, Chang SW, Horn EM, Mariwalla NR, et al.: Thoracolumbar and sacral spinal injuries in children and adolescents: a review of 89 cases. J Neurosurg 106 (6 Suppl):426433, 2007

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 6

    Fleiss JL: Measuring nominal scale agreement among many raters. Psychol Bull 76:378382, 1971

  • 7

    Fleiss JL: Statistical Methods for Rates and Proportions, ed 2. New York: Wiley, 1981

  • 8

    Joaquim AF, Lawrence B, Daubs M, Brodke D, Tedeschi H, Vaccaro AR, et al.: Measuring the impact of the Thoracolumbar Injury Classification and Severity Score among 458 consecutively treated patients. J Spinal Cord Med 37:101106, 2014

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 9

    Junkins EP Jr, Stotts A, Santiago R, Guenther E: The clinical presentation of pediatric thoracolumbar fractures: a prospective study. J Trauma 65:10661071, 2008

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 10

    Kendall M, Gibbons J: Rank Correlation Methods, ed 5. London: Edward Arnold, 1990

  • 11

    Kimberlin CL, Winterstein AG: Validity and reliability of measurement instruments used in research. Am J Health Syst Pharm 65:22762284, 2008

  • 12

    Koh YD, Kim DJ, Koh YW: Reliability and validity of Thoracolumbar Injury Classification and Severity Score (TLICS). Asian Spine J 4:109117, 2010

  • 13

    Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 33:159174, 1977

  • 14

    Lee GY, Lee JW, Choi SW, Lim HJ, Sun HY, Kang Y, et al.: MRI inter-reader and intra-reader reliabilities for assessing injury morphology and posterior ligamentous complex integrity of the spine according to the thoracolumbar injury classification system and severity score. Korean J Radiol 16:889898, 2015

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 15

    Lee JY, Vaccaro AR, Lim MR, Öner FC, Hulbert RJ, Hedlund R, et al.: Thoracolumbar injury classification and severity score: a new paradigm for the treatment of thoracolumbar spine trauma. J Orthop Sci 10:671675, 2005

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 16

    Lenarz CJ, Place HM, Lenke LG, Alander DH, Oliver D: Comparative reliability of 3 thoracolumbar fracture classification systems. J Spinal Disord Tech 22:422427, 2009

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 17

    Lewkonia P, Paolucci EO, Thomas K: Reliability of the thoracolumbar injury classification and severity score and comparison with the Denis classification for injury to the thoracic and lumbar spine. Spine (Phila Pa 1976) 37:21612167, 2012

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 18

    Magerl F, Aebi M, Gertzbein SD, Harms J, Nazarian S: A comprehensive classification of thoracic and lumbar injuries. Eur Spine J 3:184201, 1994

  • 19

    McGraw KO, Wong SP: Forming inferences about some intraclass correlation coefficients. Psychol Methods 1:3046, 1996

  • 20

    Parent S, Dimar J, Dekutoski M, Roy-Beaudry M: Unique features of pediatric spinal cord injury. Spine (Phila Pa 1976) 35 (21 Suppl):S202S208, 2010

    • Search Google Scholar
    • Export Citation
  • 21

    Patel AA, Dailey A, Brodke DS, Daubs M, Harrop J, Whang PG, et al.: Thoracolumbar spine trauma classification: the Thoracolumbar Injury Classification and Severity Score system and case examples. J Neurosurg Spine 10:201206, 2009

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 22

    Patel AA, Vaccaro AR: Thoracolumbar spine trauma classification. J Am Acad Orthop Surg 18:6371, 2010

  • 23

    Patel AA, Whang PG, Brodke DS, Agarwal A, Hong J, Fernandez C, et al.: Evaluation of two novel thoracolumbar trauma classification systems. Indian J Orthop 41:322326, 2007

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 24

    Raja Rampersaud Y, Fisher C, Wilsey J, Arnold P, Anand N, Bono CM, et al.: Agreement between orthopedic surgeons and neurosurgeons regarding a new algorithm for the treatment of thoracolumbar injuries: a multicenter reliability study. J Spinal Disord Tech 19:477482, 2006

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 25

    Reilly CW: Pediatric spine trauma. J Bone Joint Surg Am 89 (Suppl 1):98107, 2007

  • 26

    Rihn JA, Anderson DT, Harris E, Lawrence J, Jonsson H, Wilsey J, et al.: A review of the TLICS system: a novel, user-friendly thoracolumbar trauma classification system. Acta Orthop 79:461466, 2008

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 27

    Savage JW, Moore TA, Arnold PM, Thakur N, Hsu WK, Patel AA, et al.: The reliability and validity of the thoracolumbar injury classification system in pediatric spine trauma. Spine (Phila Pa 1976) 40:E1014E1018, 2015

    • Search Google Scholar
    • Export Citation
  • 28

    Schweitzer KM, Vaccaro AR, Harrop JS, Hurlbert J, Carrino JA, Rechtine GR, et al.: Interrater reliability of identifying indicators of posterior ligamentous complex disruption when plain films are indeterminate in thoracolumbar injuries. J Orthop Sci 12:437442, 2007

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 29

    Sellin JN, Steele WJ III, Simpson L, Huff WX, Lane BC, Chern JJ, et al.: Multicenter retrospective evaluation of the validity of the Thoracolumbar Injury Classification and Severity Score system in children. J Neurosurg Pediatr 18:164170, 2016

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 30

    Sethi MK, Schoenfeld AJ, Bono CM, Harris MB: The evolution of thoracolumbar injury classification systems. Spine J 9:780788, 2009

  • 31

    Shrout PE, Fleiss JL: Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86:420428, 1979

  • 32

    Slotkin JR, Lu Y, Wood KB: Thoracolumbar spinal trauma in children. Neurosurg Clin N Am 18:621630, 2007

  • 33

    Vaccaro AR, Lehman RA Jr, Hurlbert RJ, Anderson PA, Harris M, Hedlund R, et al.: A new classification of thoracolumbar injuries: the importance of injury morphology, the integrity of the posterior ligamentous complex, and neurologic status. Spine (Phila Pa 1976) 30:23252333, 2005

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation
  • 34

    Vaccaro AR, Rihn JA, Saravanja D, Anderson DG, Hilibrand AS, Albert TJ, et al.: Injury of the posterior ligamentous complex of the thoracolumbar spine: a prospective evaluation of the diagnostic accuracy of magnetic resonance imaging. Spine (Phila Pa 1976) 34:E841E847, 2009

    • Search Google Scholar
    • Export Citation
  • 35

    Wood KB, Khanna G, Vaccaro AR, Arnold PM, Harris MB, Mehbod AA: Assessment of two thoracolumbar fracture classification systems as used by multiple surgeons. J Bone Joint Surg Am 87:14231429, 2005

    • PubMed
    • Search Google Scholar
    • Export Citation

Metrics

All Time Past Year Past 30 Days
Abstract Views 540 0 0
Full Text Views 883 265 16
PDF Downloads 625 222 31
EPUB Downloads 0 0 0