Novel artificial intelligence algorithm: an accurate and independent measure of spinopelvic parameters

Lindsay D. Orosz Department of Research, National Spine Health Foundation, Reston;

Search for other papers by Lindsay D. Orosz in
jns
Google Scholar
PubMed
Close
 MS, PA-C
,
Fenil R. Bhatt Department of Spine Surgery, Virginia Spine Institute, Reston, Virginia;

Search for other papers by Fenil R. Bhatt in
jns
Google Scholar
PubMed
Close
 BS
,
Ehsan Jazini Department of Spine Surgery, Virginia Spine Institute, Reston, Virginia;

Search for other papers by Ehsan Jazini in
jns
Google Scholar
PubMed
Close
 MD
,
Marcel Dreischarf Department of Research and Development, RAYLYTIC GmbH, Leipzig, Germany

Search for other papers by Marcel Dreischarf in
jns
Google Scholar
PubMed
Close
 PhD
,
Priyanka Grover Department of Research and Development, RAYLYTIC GmbH, Leipzig, Germany

Search for other papers by Priyanka Grover in
jns
Google Scholar
PubMed
Close
 MS
,
Julia Grigorian Department of Research, National Spine Health Foundation, Reston;

Search for other papers by Julia Grigorian in
jns
Google Scholar
PubMed
Close
 BA
,
Rita Roy Department of Research, National Spine Health Foundation, Reston;

Search for other papers by Rita Roy in
jns
Google Scholar
PubMed
Close
 MD
,
Thomas C. Schuler Department of Spine Surgery, Virginia Spine Institute, Reston, Virginia;

Search for other papers by Thomas C. Schuler in
jns
Google Scholar
PubMed
Close
 MD
,
Christopher R. Good Department of Spine Surgery, Virginia Spine Institute, Reston, Virginia;

Search for other papers by Christopher R. Good in
jns
Google Scholar
PubMed
Close
 MD
, and
Colin M. Haines Department of Spine Surgery, Virginia Spine Institute, Reston, Virginia;

Search for other papers by Colin M. Haines in
jns
Google Scholar
PubMed
Close
 MD
Free access

OBJECTIVE

The analysis of sagittal alignment by measuring spinopelvic parameters has been widely adopted among spine surgeons globally, and sagittal imbalance is a well-documented cause of poor quality of life. These measurements are time-consuming but necessary to make, which creates a growing need for an automated analysis tool that measures spinopelvic parameters with speed, precision, and reproducibility without relying on user input. This study introduces and evaluates an algorithm based on artificial intelligence (AI) that fully automatically measures spinopelvic parameters.

METHODS

Two hundred lateral lumbar radiographs (pre- and postoperative images from 100 patients undergoing lumbar fusion) were retrospectively analyzed by board-certified spine surgeons who digitally measured lumbar lordosis, pelvic incidence, pelvic tilt, and sacral slope. The novel AI algorithm was also used to measure the same parameters. To evaluate the agreement between human and AI-automated measurements, the mean error (95% CI, SD) was calculated and interrater reliability was assessed using the 2-way random single-measure intraclass correlation coefficient (ICC). ICC values larger than 0.75 were considered excellent.

RESULTS

The AI algorithm determined all parameters in 98% of preoperative and in 95% of postoperative images with excellent ICC values (preoperative range 0.85–0.92, postoperative range 0.81–0.87). The mean errors were smallest for pelvic incidence both pre- and postoperatively (preoperatively −0.5° [95% CI −1.5° to 0.6°] and postoperatively 0.0° [95% CI −1.1° to 1.2°]) and largest preoperatively for sacral slope (−2.2° [95% CI −3.0° to −1.5°]) and postoperatively for lumbar lordosis (3.8° [95% CI 2.5° to 5.0°]).

CONCLUSIONS

Advancements in AI translate to the arena of medical imaging analysis. This method of measuring spinopelvic parameters on spine radiographs has excellent reliability comparable to expert human raters. This application allows users to accurately obtain critical spinopelvic measurements automatically, which can be applied to clinical practice. This solution can assist physicians by saving time in routine work and by avoiding error-prone manual measurements.

ABBREVIATIONS

AI = artificial intelligence; ICC = intraclass correlation coefficient; PI-LL = pelvic incidence–lumbar lordosis; PT = pelvic tilt; RMSE = root mean square error; SS = sacral slope.

OBJECTIVE

The analysis of sagittal alignment by measuring spinopelvic parameters has been widely adopted among spine surgeons globally, and sagittal imbalance is a well-documented cause of poor quality of life. These measurements are time-consuming but necessary to make, which creates a growing need for an automated analysis tool that measures spinopelvic parameters with speed, precision, and reproducibility without relying on user input. This study introduces and evaluates an algorithm based on artificial intelligence (AI) that fully automatically measures spinopelvic parameters.

METHODS

Two hundred lateral lumbar radiographs (pre- and postoperative images from 100 patients undergoing lumbar fusion) were retrospectively analyzed by board-certified spine surgeons who digitally measured lumbar lordosis, pelvic incidence, pelvic tilt, and sacral slope. The novel AI algorithm was also used to measure the same parameters. To evaluate the agreement between human and AI-automated measurements, the mean error (95% CI, SD) was calculated and interrater reliability was assessed using the 2-way random single-measure intraclass correlation coefficient (ICC). ICC values larger than 0.75 were considered excellent.

RESULTS

The AI algorithm determined all parameters in 98% of preoperative and in 95% of postoperative images with excellent ICC values (preoperative range 0.85–0.92, postoperative range 0.81–0.87). The mean errors were smallest for pelvic incidence both pre- and postoperatively (preoperatively −0.5° [95% CI −1.5° to 0.6°] and postoperatively 0.0° [95% CI −1.1° to 1.2°]) and largest preoperatively for sacral slope (−2.2° [95% CI −3.0° to −1.5°]) and postoperatively for lumbar lordosis (3.8° [95% CI 2.5° to 5.0°]).

CONCLUSIONS

Advancements in AI translate to the arena of medical imaging analysis. This method of measuring spinopelvic parameters on spine radiographs has excellent reliability comparable to expert human raters. This application allows users to accurately obtain critical spinopelvic measurements automatically, which can be applied to clinical practice. This solution can assist physicians by saving time in routine work and by avoiding error-prone manual measurements.

In Brief

This study aimed to determine if artificial intelligence (AI) could accurately and independently measure 4 key spinopelvic parameters on radiographs comparable to the manual measurements by experts. AI was found to have a high degree of accuracy when compared to experts. Spinopelvic measurements are part of the surgical routine, but are time-consuming and potentially error prone. Advancements in AI can increase efficiency and reduce errors in the day-to-day workflow.

Proper sagittal balance is the most efficient physiological alignment of the spine and is one of the most crucial factors to consider in every spine patient both before and after surgery.13 Sagittal imbalance is associated with low-back pain, difficulty with activities of daily living, and poor quality of life.46 If sagittal balance is not restored during spine surgery, poor patient-reported outcomes and adjacent-segment breakdown can result.710 Preoperatively, the sagittal plane assessment gives insight into global balance and the use of compensatory mechanisms; it allows the calculation of the pelvic incidence–lumbar lordosis (PI-LL) mismatch to be made.8,10,11 This information is integral to the surgical planning process by providing a target to avoid overcorrection or undercorrection. Postoperatively, the sagittal plane assessment can serve as a quality metric determining if balance was achieved during surgery and maintained over time.4,7,9

The list of sagittal parameters continues to grow while the task of measuring these parameters is a time-consuming process that requires experience and precision.1215 Although validated software does exist to assist in the measurement of spinopelvic parameters, these continue to require user input to identify several landmarks, and therefore user experience and time are factors preventing widespread adoption.12,1619 There is a need for an automated tool to measure sagittal parameters accurately and independently. Artificial intelligence (AI) has shown promise in carrying out repetitive tasks performed by humans and often achieves a more accurate and reliable result.16,17,2022 This study aimed to validate a novel, fully automatic measurement method in which an AI algorithm was used to measure spinopelvic parameters by comparing the reliability between human- and AI-generated measurements for LL, PI, pelvic tilt (PT), and sacral slope (SS). We hypothesized that there would be excellent agreement between the measurements made by expert human raters and the AI algorithm for each tested parameter.

Methods

Study Design and Patient Selection for Validation

A cohort of 100 patients was retrospectively selected from a single, multisurgeon center after having been treated between 2017 and 2019. Inclusion criteria were for any patient having undergone a lumbar spine fusion with pre- and postoperative standing lateral lumbar radiographs that included the femoral heads. A variety of surgical approaches, number of levels fused, and instrumentation types were included to validate the AI algorithm’s ability to detect landmarks in a heterogeneous group. Preoperative diagnoses, number of levels fused, surgical approach, instrumentation type, and the presence of anatomical variances were collected from the electronic medical recording system and operative reports. PI and LL were evaluated to determine the percentage of patients with PI-LL match (PI minus LL ± 10°) or mismatch (PI-LL > 10°).

Standing pre- and postoperative lateral radiographs were collected on all 100 patients (200 radiographs), anonymized, and uploaded into the AI database as DICOM images. There they were preprocessed automatically using standard DICOM tags of window width and center to adjust the brightness and contrast. The institution’s radiographs were taken by 1 experienced technician, who used their standardized imaging technique with patients standing in a neutral position, using a digital radiography system (Samsung GU60). These digital images were accessed in the institution’s PACS by the human raters, and measurements were made digitally and recorded. AI and human raters were blinded to each other’s measurements.

All patients or their legal guardians signed written informed consent within the institute’s Notice of Privacy Practices prior to surgery. This study was conducted in accordance with the 1964 Helsinki Declaration, its amendments, and other equivalent ethical standards.

Measurement Validation

The institution’s expert rater (rater 1) measured LL, PI, PT, and SS on every pre- and postoperative radiograph (200 radiographs). These measurements were compared to the parameters generated by the AI algorithm. To assess interrater reliability, a randomly selected second set of measurements was collected on 50% of the pre- and postoperative radiographs (100 radiographs). These parameters were measured by 1 of the institution’s 4 fellowship-trained spine surgeons, all of whom are highly experienced raters, and were collectively known as rater 2 (Fig. 1). These digital measurements, extracted from the PACS, were the actual parameters used in the surgical planning and execution process; years prior to this study, this institution implemented a mandatory sagittal plane assessment pre- and postoperatively in which these same 4 pelvic parameters were used. Interrater reliability was assessed, and AI measurements were compared to those made by both raters.

FIG. 1.
FIG. 1.

Flowchart demonstrating the AI training and validation phases.

AI Training

Prior to the validation phase, the AI training phase was accomplished. Three different deep learning models were trained to perform segmentation of the spine and place landmarks on vertebral bodies and the sacrum, as shown in the AI setup in Fig. 2A and B. All anatomical structures—lumbar vertebral bodies, sacrum, and femoral heads—were manually segmented to generate the ground truth for training the segmentation model. For the landmark placement models, 6 landmarks on each vertebral body (3 each on superior and inferior endplate) and 5 landmarks on the sacrum were labeled manually on the training data set by trained personnel.

FIG. 2.
FIG. 2.

Workflow of the presented method on a preoperative image (A), depicting input lateral radiographic lumbar spine image (left panel); automatic segmentation of relevant anatomical entities and automatic placement of reference landmarks (center panels); and computation of spinopelvic parameters (right panel), and a postoperative image (B), depicting automatic segmentation of relevant anatomical entities and spinal implants (left panel); automatic placement of reference landmarks on L1 and the sacrum (center panel); and computation of spinopelvic parameters (right panel).

A total of 600 anonymized lateral spine radiographs were used to train the AI algorithm by using the TensorFlow framework on NVIDIA GeForce GTX 1080 GPU. The training data set comprised pre- and postoperative images obtained from 18 different clinical sites, with patients suffering from various spinal disorders (i.e., scoliosis, disc degeneration, stenosis, fractures). The training images were completely independent from the 200 validation images measured by the human raters. The images used for training were preprocessed automatically by using standard DICOM tags of window width and center to adjust the brightness and contrast. Additionally, an adaptive histogram equalization method was used to further enhance the contrast of the images.

For training the first model, the preprocessed images were fed to a convolutional neural network based on Cascade R-CNN to segment the anatomical structures, i.e., lumbar, sacrum, and femoral heads.23 In addition to the anatomical categories, the instrumentation present in the postoperative images was also labeled so that the model learned the difference between anatomical structures and spinal implants, as shown in Fig. 2B. The model was initialized with publicly available pretrained weights from the model zoo and trained for 100 further epochs, with the learning rate of 0.002 on lateral spine preprocessed data. During training, horizontal flipping augmentation was applied to half of the input images randomly to allow the model to handle images regardless of whether they were anterior-left posterior-right or vice versa. During inference, the trained algorithm not only localized and classified the anatomical structures, but also provided the confidence score of each detected category. As part of postprocessing of the inferred results of the model, the "sacrum" category was allowed to appear only once by eliminating the multiple occurrences with lower confidence. All of the lumbar vertebral bodies detected by the instance segmentation model were arranged vertically by using the y coordinates of the center of each bounding box encapsulating them. The vertical arrangement allowed the assignment of correct labels from caudad to cephalad (L5 to L1). The crops of all anatomical structures were used as an input to train the next AI models.

Two different models were trained for automatic placement of landmarks. One of the models was specialized for learning 6 reference points on the vertebral bodies (3 landmarks each on upper and lower endplates), and the second model was optimized to place 5 landmarks on the sacral endplate. Both models were based on the U-Net architecture, which was modified to produce landmarks in the final layer.24 The model was trained for 100 epochs on the squared crops of the region of interest of size 256 × 256 with the learning rate of 0.001. The input crops were randomly rotated and scaled as a part of augmentation.

To automatically compute the sagittal balance parameters on any test image, the segmentation model predicts the location of L1, the sacrum, and femoral heads and creates their crops. The landmark detection model then predicts the anatomical reference points. These reference points were used for line regression, computation of slopes, and hence the automatic output of measurements of LL, PI, PT, and SS (Fig. 2).

Statistical Analysis

For intermediate evaluations of segmentation and landmark placement models, the Dice coefficient and Euclidean distance error were analyzed respectively on 60 test images.

To evaluate the agreement between the human raters’ measurements and between AI and human measurements, the mean difference along with its 95% CI, the root mean square error (RMSE), and the SD were determined. In addition, the interrater reliability between human raters and the AI algorithm (rater 1 vs rater 2, rater 1 vs AI, rater 2 vs AI) was assessed by using 2-way random single-measure intraclass correlation coefficients (ICCs) to quantify the degree of absolute agreement.2527 ICC values between 0.75 and 1.00 are interpreted as having excellent reliability (good, 0.60–0.74; fair, 0.40–0.59; and poor, < 0.40).28,29 Furthermore, to allow comparison with the scientific literature, Pearson’s r was calculated. All statistical evaluations were completed in Python 3 programming language (Van Rossum and Drake, 2009).

Results

The intermediate results for accuracy of anatomical labeling, segmentation, and landmark placement on 60 test radiographs demonstrated an average detection rate of 96% for all labeled anatomical structures and a Dice coefficient of 0.85 for instance segmentation. The average Euclidean distance error for landmark placement models resulted in a 2.1-mm error for 6 landmarks on the L1 endplate and a 1.9-mm error for 5 landmarks on the sacral endplate.

Of the 100 patients included in the validation cohort, the most common primary preoperative diagnosis was lumbar segmental instability (71%). The majority of patients underwent a 1- or 2-level fusion procedure (82%), with a combined circumferential approach (88%). Anatomical variations were found in 43% of patients: 26% with a coronal deformity (> 10°), 14% with transitional lumbosacral anatomy (defined as 4 or 6 lumbar vertebrae and uni- or bilateral Bertolotti’s pseudoarticulation), and 3% with hip arthroplasty implants. PI-LL calculations were made and a mismatch was found in 33% of patients, despite only 5% carrying a primary diagnosis of flat back deformity (Table 1).

TABLE 1.

Surgical data for 100 patients who underwent lumbar fusion

VariableValue (n = 100)
Preop diagnosis
 Degenerative lumbar scoliosis5%
 Lumbar flat back deformity5%
 Lumbar segmental instability71%
 Lumbar spinal stenosis19%
No. of levels treated
 140%
 242%
 312%
 4–66%
Surgical approach
 Anterior only12%
 Anterior/posterior88%
Instrumentation
 Pedicle screws83%
 Cortical pedicle screws5%
 Pelvic fixation6%
Anatomical variances
 Coronal deformity (>10°)26%
 Transitional lumbosacral anatomy14%
 Hip arthroplasty3%
PI-LL
 ± 10° (match)67%
 Mismatch33%

The statistical measures for comparison between rater 1 and rater 2 are shown in Table 2. The ICC values for interrater reliability between human raters were smallest for SS (ICCpreoperative = 0.92, ICCpostoperative = 0.90) and largest for PI (ICCpreoperative = 0.96, ICCpostoperative = 0.95). The mean difference for all parameters between rater 1 and rater 2 was less than 1°.

TABLE 2.

Interrater reliability of rater 1 versus rater 2, in 100 radiographs

Statistical MethodSSPTPILL
Preop
 ICC (95% CI)0.92 (0.86 to 0.95)0.93 (0.88 to 0.96)0.96 (0.92 to 0.97)0.92 (0.86 to 0.95)
 Mean error (95% CI)0.3° (−0.7° to 1.3°)0.2° (−0.7° to 1.0°)0.2° (−0.8° to 1.2°)−0.9° (−2.3° to 0.5°)
 SD3.5°3.0°3.5°5.0°
 RMSE2.7°1.9°2.7°3.2°
 Pearson correlation (r)0.920.930.960.92
Postop
 ICC (95% CI)0.90 (0.83 to 0.94)0.94 (0.89 to 0.97)0.95 (0.91 to 0.97)0.94 (0.90 to 0.97)
 Mean error (95% CI)0.8° (−0.4° to 1.9°)0.9° (0.2° to 1.6°)1.0° (0.1° to 2.0°)0.7° (−0.4° to 1.9°)
 SD4.0°2.3°3.4°3.9°
 RMSE2.6°1.5°2.7°2.7°
 Pearson correlation (r)0.910.950.960.94

The AI algorithm detected all relevant anatomical structures needed for the computation of parameters for sagittal balance in 98% of preoperative and 95% of postoperative images. The factors for no parameter detection were as follows: 1) in the preoperative images, there was 1 case in which L1 and 1 in which femoral heads were not detected; and 2) in the postoperative images, there were 2 cases with transient anatomy, 1 in which sacrum was detected at a wrong position and hence automatically discarded in the sanity check, and 2 in which femoral heads were not detected.

The statistical measures for comparison between AI and rater 1 are shown in Table 3 and between AI and rater 2 in Table 4. The ICC values for interrater reliability between AI and both human raters ranged between 0.85 and 0.92 preoperatively, and between 0.75 and 0.91 postoperatively (all excellent). Overall, the ICC value of 0.75 was smallest for LL in the postoperative evaluation of AI versus rater 2 (Table 4); the remaining ICC values were greater than 0.80 postoperatively. The mean error for preoperative evaluation was smallest for PI (−0.5°) and largest for SS (−2.2°) for AI versus rater 1. For postoperative evaluation, the mean error for PI (0.0°) was smallest for AI versus both raters 1 and 2 and highest for LL (−4.1°, AI vs rater 2). Figures 3 and 4 depict the scatterplots showcasing the correlation between AI and rater 1 for pre- and postoperative images, respectively.

TABLE 3.

Interrater reliability of AI method versus rater 1, in 200 radiographs

Statistical MethodSSPTPILL
Preop
 ICC (95% CI)0.88 (0.73 to 0.94)0.85 (0.72 to 0.91)0.88 (0.82 to 0.92)0.92 (0.87 to 0.94)
 Mean error (95% CI)−2.2° (−3.0° to −1.5°)1.9° (1.2° to 2.6°)−0.5° (1.5° to 0.6°)1.3° (0.3° to 2.4°)
 SD3.8°3.7°5.4°5.3°
 RMSE4.3°4.2°5.4°5.4°
 Pearson correlation (r)0.910.880.880.92
Postop
 ICC (95% CI)0.83 (0.75 to 0.89)0.87 (0.76 to 0.92)0.85 (0.79 to 0.90)0.81 (0.59 to 0.90)
 Mean error (95% CI)−1.4° (−2.4° to −0.5°)1.6° (0.9° to 2.2°)0.0° (−1.1° to 1.2°)3.8° (2.5° to 5.0°)
 SD4.9°3.3°5.7°6.1°
 RMSE5.1°3.6°5.6°7.2°
 Pearson correlation (r)0.850.890.850.86
TABLE 4.

Interrater reliability between AI method and rater 2, in 100 radiographs

Statistical MethodSSPTPILL
Preop
 ICC (95% CI)0.85 (0.71 to 0.92)0.85 (0.75 to 0.91)0.85 (0.75 to 0.91)0.92 (0.83 to 0.96)
 Mean error (95% CI)1.9° (0.8° to 3.1°)−1.2° (−2.5° to 0.1°)0.7° (−1.1° to 2.4°)−2.0° (−3.3° to −0.7°)
 SD4.2°4.4°6.2°4.4°
 RMSE3.3°2.6°3.8°4.0°
 Pearson correlation (r)0.880.860.850.93
Postop
 ICC (95% CI)0.89 (0.80 to 0.94)0.91 (0.84 to 0.95)0.87 (0.77 to 0.92)0.75 (0.51 to 0.87)
 Mean error (95% CI)1.4° (0.3° to 2.4°)−0.9° (−1.7° to −0.1°)0.0° (−1.6° to 1.6°)−4.1° (−6.3° to −1.9°)
 SD3.6°2.8°5.4°7.5°
 RMSE3.0°1.6°3.5°6.1°
 Pearson correlation (r)0.900.920.870.80
FIG. 3.
FIG. 3.

Scatterplots showing correlation between the measurements of rater 1 and the AI method (in degrees) for 100 preoperative images. Figure is available in color online only.

FIG. 4.
FIG. 4.

Scatterplots showing correlation between the measurements of rater 1 and AI method (in degrees) for 100 postoperative images. Figure is available in color online only.

Discussion

This study developed an AI algorithm to measure spinopelvic parameters and compared the AI-generated measurements with expert human measurements. Excellent agreement was demonstrated between human- and AI-generated measurements for all 4 parameters tested: LL, PI, PT, and SS.

Spinal implants have the potential to skew the landmarks needed for AI to extract measurements, yet this algorithm was able to perform above average with excellent agreement among all 4 parameters in the postoperative cohort. This is the first study to compare the performance of the algorithm between pre- and postoperative images, while also providing surgical details to give the reader a better understanding of the types of fusion procedures included in the validation. In the current literature, Galbusera et al., Weng et al., and Schwartz et al. included images with spinal instrumentation (n = 48, n = 225, and n = 50, respectively) in their validation studies; however, they did not separately analyze their AI’s performance on these images, making the performance comparison difficult.3032 Cho et al. excluded images with instrumentation from their study, which brings into question how the AI that they used would perform under these more difficult circumstances.33 We believe this to be crucial information, given the necessity to assess sagittal alignment postoperatively.

Of the studies evaluating the accuracy and reliability of AI to measure sagittal parameters on radiographs, the study by Weng et al. had the largest training set of 990 images.31 They reported that 68 images were misread due to either anatomical variances or landmarks obscured by surrounding anatomy. Despite this, their ICC values ranged from 0.95 to 0.99 when comparing AI to each rater; however, the only parameter measured was sagittal vertical axis. They expect their AI to handle images with anatomical variances better by further training with similar images. In our study, we specifically reported on the number of images with anatomical variances, which included transitional lumbosacral anatomy (defined as 4 or 6 lumbar vertebrae and uni- or bilateral Bertolotti’s pseudoarticulation) and hip prostheses. Although the AI algorithm in this study handled the presence of spinal implants well, some difficulty was seen with anatomical variances, which is similar to human raters.34 The outliers that can be seen for PT in the scatterplots (Figs. 3 and 4) were caused by either anatomical variations or detection of only one femoral head due to overlapping structures. We agree that with further training, this AI algorithm is expected to become familiar with these variations and to handle them comparably to human raters.

Whereas the breadth of knowledge in this field is growing, this study adds a more comprehensive analysis to the field. Galbusera et al. measured the most parameters (n = 5) but had few images with instrumentation (n = 48) and did not account for anatomical variances.30 Weng et al. had the largest number of training and validated images (n = 990), images with instrumentation (n = 225), and noted anatomical variances, but only measured sagittal vertical axis, as described above.31 Schwartz et al. had a large number of training images (n = 652), the largest number of images with hip prostheses (n = 15), but few with spinal instrumentation (n = 50) and few that were validated by human raters (n = 40).32 Cho et al. had a large number of training images (n = 629) but only measured LL, did not account for anatomical variances, excluded patients with instrumentation, and validated measurements by human raters on a small number of images (n = 50).33 In contrast, this paper had 600 training images, combined a large number of patients with instrumentation (n = 100) by comparing pre- and postoperative images (n = 200) for validation, used human raters to validate all 200 radiographs, accounted for anatomical variances, included surgical details, and measured 4 parameters. These studies are all similar in that they demonstrated that AI accurately measured their tested parameter(s).3033

There are limitations to this study. Intrarater reliability was not assessed. This comparison was not believed to be necessary given the expert training of the raters and the excellent agreement found in the literature between multiple measurements made by 1 rater.35 All of the validation radiographs came from a single center, representing a monocentric approach to validation. Despite this approach, the presented algorithm was trained on 600 anonymized radiographs taken from 18 different centers, which increases our confidence that the agreement demonstrated in this study would be reproducible at other centers with experienced physicians. A multicenter approach would confirm the generalizability of our results, however.

There were cases in which the algorithm could not assess some parameters. The factors that contribute to the inability to make measurements or that result in inferior agreement with human raters are as follows: transitional anatomy, presence of new implants on which the model has not been trained, detection of only one femoral head due to occlusion, no detection of femoral heads due to the presence of total hip arthroplasty, and low-quality landmark placement due to the out-of-plane effect in the acquired image. With the inclusion of more training data, the algorithm can be improved and made more robust for challenging cases. These limitations demonstrate that in a clinical setting, clinicians should be able to control and, if necessary, correct AI-based assessments to ensure high measurement quality and avoid outliers in challenging cases. Nonetheless, the demonstrated high agreement between AI and human measurements in this study will significantly reduce the time required for measurements to be made manually and thus increase efficiency in routine clinical practice.

Patients with severe coronal deformities were not well represented in this cohort and would conceivably pose a challenge to AI’s ability to identify the landmarks needed for the algorithm to generate measurements. A future direction is continued training of the algorithm with radiographs showing severe coronal deformities and the aforementioned anatomical variances. Likewise, expanding the measurement output to all sagittal parameters, segmental angles, and coronal measurements will be the focus of future validation studies.

Conclusions

This study demonstrates an accurate, independent, and reliable method for extracting spinopelvic parameters from spine radiographs by using an AI algorithm. When applied to clinical practice, this solution may save time and limit errors.

Disclosures

Dr. Good has received consulting fees from Medtronic, K2M, and Stryker; is on advisory boards for Medtronic, Stryker, and Augmedics; has received royalties from Medtronic and K2M; and has stock in Augmedics and NSite. Dr. Jazini has received consulting fees from Medtronic, Stryker, Innovasis, and Precision Spine. Dr. Haines has received consulting fees from Medtronic, Globus Medical, Innovasis, and Spineart.

Author Contributions

Conception and design: Orosz, Jazini, Dreischarf, Haines. Acquisition of data: Orosz, Jazini, Schuler, Good, Haines. Analysis and interpretation of data: Orosz, Dreischarf, Grover, Haines. Drafting the article: Orosz, Dreischarf, Grover. Critically revising the article: Orosz, Jazini, Dreischarf, Grover, Good, Haines. Reviewed submitted version of manuscript: all authors. Approved the final version of the manuscript on behalf of all authors: Orosz. Statistical analysis: Dreischarf, Grover. Administrative/technical/material support: Orosz, Bhatt, Dreischarf, Grover, Grigorian, Roy, Haines. Study supervision: Orosz, Jazini, Dreischarf, Roy, Good, Haines.

Supplemental Information

Previous Presentations

1) International Meeting on Advanced Spine Techniques (IMAST), April 21–24, 2021—podium (virtual meeting); 2) Lumbar Spine Research Society (LSRS), April 8–10, 2021—podium (virtual meeting); 3) International Society for the Advancement of Spine Surgery (ISASS), May 13–15, 2021—podium and best paper nominee (Miami, FL); and 4) North American Spine Society (NASS), September 29–October 2, 2021—podium (Boston, MA).

References

  • 1

    Le Huec JC, Thompson W, Mohsinaly Y, Barrey C, Faundez A. Sagittal balance of the spine. Eur Spine J. 2019;28(9):18891905.

  • 2

    Jackson RP, McManus AC. Radiographic analysis of sagittal plane alignment and balance in standing volunteers and patients with low back pain matched for age, sex, and size. A prospective controlled clinical study. Spine (Phila Pa 1976). 1994;19(14):16111618.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 3

    Protopsaltis TS, Soroceanu A, Tishelman JC, et al. Should sagittal spinal alignment targets for adult spinal deformity correction depend on pelvic incidence and age? Spine (Phila Pa 1976). 2020;45(4):250257.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 4

    Schwab FJ, Blondel B, Bess S, et al. Radiographical spinopelvic parameters and disability in the setting of adult spinal deformity: a prospective multicenter analysis. Spine (Phila Pa 1976). 2013;38(13):E803E812.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 5

    Glassman SD, Berven S, Bridwell K, Horton W, Dimar JR. Correlation of radiographic parameters and clinical symptoms in adult scoliosis. Spine (Phila Pa 1976). 2005;30(6):682688.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 6

    Glassman SD, Bridwell K, Dimar JR, Horton W, Berven S, Schwab F. The impact of positive sagittal balance in adult spinal deformity. Spine (Phila Pa 1976). 2005;30(18):20242029.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 7

    Alshabab BS, Gupta MC, Lafage R, et al. Does achieving global spinal alignment lead to higher patient satisfaction and lower disability in adult spinal deformity? Spine (Phila Pa 1976). 2021;46(16):11051110.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 8

    Tempel ZJ, Gandhoke GS, Bolinger BD, et al. The influence of pelvic incidence and lumbar lordosis mismatch on development of symptomatic transforaminal lumbar interbody fusion. Neurosurgery. 2017;80(6):880886.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 9

    Schwab F, Patel A, Ungar B, Farcy JP, Lafage V. Adult spinal deformity—postoperative standing imbalance: how much can you tolerate? An overview of key parameters in assessing alignment and planning corrective surgery. Spine (Phila Pa 1976). 2010;35(25):22242231.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 10

    Merrill RK, Kim JS, Leven DM, Kim JH, Cho SK. Beyond pelvic incidence-lumbar lordosis mismatch: the importance of assessing the entire spine to achieve global sagittal alignment. Global Spine J. 2017;7(6):536542.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 11

    Rothenfluh DA, Mueller DA, Rothenfluh E, Min K. Pelvic incidence-lumbar lordosis mismatch predisposes to adjacent segment disease after lumbar spinal fusion. Eur Spine J. 2015;24(6):12511258.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 12

    Vila-Casademunt A, Pellisé F, Acaroglu E, et al. The reliability of sagittal pelvic parameters: the effect of lumbosacral instrumentation and measurement experience. Spine (Phila Pa 1976). 2015;40(4):E253E258.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 13

    Klineberg E, Schwab F, Smith JS, Gupta MC, Lafage V, Bess S. Sagittal spinal pelvic alignment. Neurosurg Clin N Am. 2013;24(2):157162.

  • 14

    Vrtovec T, Janssen MMA, Likar B, Castelein RM, Viergever MA, Pernuš F. A review of methods for evaluating the quantitative parameters of sagittal pelvic alignment. Spine J. 2012;12(5):433446.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 15

    Diebo BG, Varghese JJ, Lafage R, Schwab FJ, Lafage V. Sagittal alignment of the spine: what do you need to know? Clin Neurol Neurosurg. 2015;139:295301.

  • 16

    Segev E, Hemo Y, Wientroub S, et al. Intra- and interobserver reliability analysis of digital radiographic measurements for pediatric orthopedic parameters using a novel PACS integrated computer software program. J Child Orthop. 2010;4(4):331341.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 17

    Gupta M, Henry JK, Schwab F, et al. Dedicated spine measurement software quantifies key spino-pelvic parameters more reliably than traditional picture archiving and communication systems tools. Spine (Phila Pa 1976). 2016;41(1):E22E27.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 18

    Wu J, Wei F, Ma L, et al. Accuracy and reliability of standing lateral lumbar radiographs for measurements of spinopelvic parameters. Spine (Phila Pa 1976). 2021;46(15):10331038.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 19

    Lafage R, Ferrero E, Henry JK, et al. Validation of a new computer-assisted tool to measure spino-pelvic parameters. Spine J. 2015;15(12):24932502.

  • 20

    Galbusera F, Casaroli G, Bassani T. Artificial intelligence and machine learning in spine research. JOR Spine. 2019;2(1):e1044.

  • 21

    Watanabe K, Aoki Y, Matsumoto M. An application of artificial intelligence to diagnostic imaging of spine disease: estimating spinal alignment from moiré images. Neurospine. 2019;16(4):697702.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 22

    Cheung KM. Commentary on an application of artificial intelligence to diagnostic imaging of spine disease: estimating spinal alignment from moiré images. Neurospine. 2019;16(4):703704.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 23

    Cai Z, Vasconcelos N. Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell. 2021;43(5):14831498.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 24

    Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Int Conf Med Image Comput Comput Interv. 2015;9351:1220.

  • 25

    Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155163.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 26

    McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1(1):3046.

  • 27

    Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420428.

  • 28

    Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284290.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 29

    Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. 3rd ed. Pearson/Prentice Hall;2009.

  • 30

    Galbusera F, Niemeyer F, Wilke HJ, et al. Fully automated radiological analysis of spinal disorders and deformities: a deep learning approach. Eur Spine J. 2019;28(5):951960.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 31

    Weng CH, Wang CL, Huang YJ, et al. Artificial intelligence for automatic measurement of sagittal vertical axis using ResUNet framework. J Clin Med. 2019;8(11):1826.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 32

    Schwartz JT, Cho BH, Tang P, et al. Deep learning automates measurement of spinopelvic parameters on lateral lumbar radiographs. Spine (Phila Pa 1976). 2021;46(12):E671E678.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 33

    Cho BH, Kaji D, Cheung ZB, et al. Automated measurement of lumbar lordosis on radiographs using machine learning and computer vision. Global Spine J. 2020;10(5):611618.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 34

    Khalsa AS, Mundis GM Jr, Yagi M, et al. Variability in assessing spinopelvic parameters with lumbosacral transitional vertebrae: inter- and intraobserver reliability among spine surgeons. Spine (Phila Pa 1976). 2018;43(12):813816.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 35

    Somoskeöy S, Tunyogi-Csapó M, Bogyó C, Illés T. Accuracy and reliability of coronal and sagittal spinal curvature data based on patient-specific three-dimensional models created by the EOS 2D/3D imaging system. Spine J. 2012;12(11):10521059.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • Collapse
  • Expand

Illustrations from Hagan et al. (pp 843–850). © Albert Telfeian, published with permission.

  • FIG. 1.

    Flowchart demonstrating the AI training and validation phases.

  • FIG. 2.

    Workflow of the presented method on a preoperative image (A), depicting input lateral radiographic lumbar spine image (left panel); automatic segmentation of relevant anatomical entities and automatic placement of reference landmarks (center panels); and computation of spinopelvic parameters (right panel), and a postoperative image (B), depicting automatic segmentation of relevant anatomical entities and spinal implants (left panel); automatic placement of reference landmarks on L1 and the sacrum (center panel); and computation of spinopelvic parameters (right panel).

  • FIG. 3.

    Scatterplots showing correlation between the measurements of rater 1 and the AI method (in degrees) for 100 preoperative images. Figure is available in color online only.

  • FIG. 4.

    Scatterplots showing correlation between the measurements of rater 1 and AI method (in degrees) for 100 postoperative images. Figure is available in color online only.

  • 1

    Le Huec JC, Thompson W, Mohsinaly Y, Barrey C, Faundez A. Sagittal balance of the spine. Eur Spine J. 2019;28(9):18891905.

  • 2

    Jackson RP, McManus AC. Radiographic analysis of sagittal plane alignment and balance in standing volunteers and patients with low back pain matched for age, sex, and size. A prospective controlled clinical study. Spine (Phila Pa 1976). 1994;19(14):16111618.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 3

    Protopsaltis TS, Soroceanu A, Tishelman JC, et al. Should sagittal spinal alignment targets for adult spinal deformity correction depend on pelvic incidence and age? Spine (Phila Pa 1976). 2020;45(4):250257.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 4

    Schwab FJ, Blondel B, Bess S, et al. Radiographical spinopelvic parameters and disability in the setting of adult spinal deformity: a prospective multicenter analysis. Spine (Phila Pa 1976). 2013;38(13):E803E812.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 5

    Glassman SD, Berven S, Bridwell K, Horton W, Dimar JR. Correlation of radiographic parameters and clinical symptoms in adult scoliosis. Spine (Phila Pa 1976). 2005;30(6):682688.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 6

    Glassman SD, Bridwell K, Dimar JR, Horton W, Berven S, Schwab F. The impact of positive sagittal balance in adult spinal deformity. Spine (Phila Pa 1976). 2005;30(18):20242029.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 7

    Alshabab BS, Gupta MC, Lafage R, et al. Does achieving global spinal alignment lead to higher patient satisfaction and lower disability in adult spinal deformity? Spine (Phila Pa 1976). 2021;46(16):11051110.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 8

    Tempel ZJ, Gandhoke GS, Bolinger BD, et al. The influence of pelvic incidence and lumbar lordosis mismatch on development of symptomatic transforaminal lumbar interbody fusion. Neurosurgery. 2017;80(6):880886.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 9

    Schwab F, Patel A, Ungar B, Farcy JP, Lafage V. Adult spinal deformity—postoperative standing imbalance: how much can you tolerate? An overview of key parameters in assessing alignment and planning corrective surgery. Spine (Phila Pa 1976). 2010;35(25):22242231.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 10

    Merrill RK, Kim JS, Leven DM, Kim JH, Cho SK. Beyond pelvic incidence-lumbar lordosis mismatch: the importance of assessing the entire spine to achieve global sagittal alignment. Global Spine J. 2017;7(6):536542.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 11

    Rothenfluh DA, Mueller DA, Rothenfluh E, Min K. Pelvic incidence-lumbar lordosis mismatch predisposes to adjacent segment disease after lumbar spinal fusion. Eur Spine J. 2015;24(6):12511258.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 12

    Vila-Casademunt A, Pellisé F, Acaroglu E, et al. The reliability of sagittal pelvic parameters: the effect of lumbosacral instrumentation and measurement experience. Spine (Phila Pa 1976). 2015;40(4):E253E258.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 13

    Klineberg E, Schwab F, Smith JS, Gupta MC, Lafage V, Bess S. Sagittal spinal pelvic alignment. Neurosurg Clin N Am. 2013;24(2):157162.

  • 14

    Vrtovec T, Janssen MMA, Likar B, Castelein RM, Viergever MA, Pernuš F. A review of methods for evaluating the quantitative parameters of sagittal pelvic alignment. Spine J. 2012;12(5):433446.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 15

    Diebo BG, Varghese JJ, Lafage R, Schwab FJ, Lafage V. Sagittal alignment of the spine: what do you need to know? Clin Neurol Neurosurg. 2015;139:295301.

  • 16

    Segev E, Hemo Y, Wientroub S, et al. Intra- and interobserver reliability analysis of digital radiographic measurements for pediatric orthopedic parameters using a novel PACS integrated computer software program. J Child Orthop. 2010;4(4):331341.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 17

    Gupta M, Henry JK, Schwab F, et al. Dedicated spine measurement software quantifies key spino-pelvic parameters more reliably than traditional picture archiving and communication systems tools. Spine (Phila Pa 1976). 2016;41(1):E22E27.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 18

    Wu J, Wei F, Ma L, et al. Accuracy and reliability of standing lateral lumbar radiographs for measurements of spinopelvic parameters. Spine (Phila Pa 1976). 2021;46(15):10331038.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 19

    Lafage R, Ferrero E, Henry JK, et al. Validation of a new computer-assisted tool to measure spino-pelvic parameters. Spine J. 2015;15(12):24932502.

  • 20

    Galbusera F, Casaroli G, Bassani T. Artificial intelligence and machine learning in spine research. JOR Spine. 2019;2(1):e1044.

  • 21

    Watanabe K, Aoki Y, Matsumoto M. An application of artificial intelligence to diagnostic imaging of spine disease: estimating spinal alignment from moiré images. Neurospine. 2019;16(4):697702.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 22

    Cheung KM. Commentary on an application of artificial intelligence to diagnostic imaging of spine disease: estimating spinal alignment from moiré images. Neurospine. 2019;16(4):703704.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 23

    Cai Z, Vasconcelos N. Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell. 2021;43(5):14831498.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 24

    Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. Int Conf Med Image Comput Comput Interv. 2015;9351:1220.

  • 25

    Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155163.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 26

    McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1(1):3046.

  • 27

    Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420428.

  • 28

    Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284290.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 29

    Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. 3rd ed. Pearson/Prentice Hall;2009.

  • 30

    Galbusera F, Niemeyer F, Wilke HJ, et al. Fully automated radiological analysis of spinal disorders and deformities: a deep learning approach. Eur Spine J. 2019;28(5):951960.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 31

    Weng CH, Wang CL, Huang YJ, et al. Artificial intelligence for automatic measurement of sagittal vertical axis using ResUNet framework. J Clin Med. 2019;8(11):1826.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 32

    Schwartz JT, Cho BH, Tang P, et al. Deep learning automates measurement of spinopelvic parameters on lateral lumbar radiographs. Spine (Phila Pa 1976). 2021;46(12):E671E678.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 33

    Cho BH, Kaji D, Cheung ZB, et al. Automated measurement of lumbar lordosis on radiographs using machine learning and computer vision. Global Spine J. 2020;10(5):611618.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 34

    Khalsa AS, Mundis GM Jr, Yagi M, et al. Variability in assessing spinopelvic parameters with lumbosacral transitional vertebrae: inter- and intraobserver reliability among spine surgeons. Spine (Phila Pa 1976). 2018;43(12):813816.

    • PubMed
    • Search Google Scholar
    • Export Citation
  • 35

    Somoskeöy S, Tunyogi-Csapó M, Bogyó C, Illés T. Accuracy and reliability of coronal and sagittal spinal curvature data based on patient-specific three-dimensional models created by the EOS 2D/3D imaging system. Spine J. 2012;12(11):10521059.

    • PubMed
    • Search Google Scholar
    • Export Citation

Metrics

All Time Past Year Past 30 Days
Abstract Views 1191 297 0
Full Text Views 606 311 33
PDF Downloads 551 322 28
EPUB Downloads 0 0 0