There has been increasing interest in prediction modeling to determine the prognosis of patients with spine metastases,1–11 postsurgical discharge disposition,12–15 chances of returning to work after surgery,16,17 and overall postoperative clinical outcomes after spine surgery. Early scoring systems1,2 were often derived from small, highly selected patient populations and are consequently limited in their generalizability. More recently, the use of machine learning and artificial intelligence techniques with large data sets has been employed to minimize bias and provide potentially superior modeling.18,19 There has also been a push toward presenting these models in the form of web-based calculators for ease of use for patients and providers.20
As there are currently multiple proposed prediction models for various spinal pathologies (e.g., spine metastasis, degenerative spine disease), it is important to systematically compare their relative strengths and weaknesses in a standardized fashion. There is substantial heterogeneity in the design and assessment of different prediction models. These include differences in statistical analysis and assessment of discriminative performance.15,21 There is also heterogeneity in how models are validated, and a wide range of discriminative abilities, with areas under the curve (AUCs) ranging from 0.60 to 0.80.22–24
Given the heterogeneous nature of these tools, the wide range of reported AUCs, and the need for comprehensive assessment of external validation performance, we sought to develop a scoring system whereby one could standardize, evaluate, and compare the models. To address this need, we sought to accomplish two goals: 1) propose a grading system that can be applied to any predictive model for a specific clinical outcome, and 2) apply this grading system to contemporary clinical predictive models that are widely used in spine surgery.
Methods
Literature Search
Most clinical prediction models in the field of spine surgery are found in one of two areas: spine metastasis or cervical/lumbar degenerative disease. Therefore, two literature searches were performed on March 29, 2020. The following search strings were utilized and modified to be inputted into PubMed and Embase:
For spine metastasis studies: (“spine” OR “spinal” OR “vertebrae” OR “vertebral”) AND (“metastasis” OR “metastatic”) AND (“survival” OR “mortality”) AND (“surgery” OR “surgical”) AND (“prognostic” or “prognosis” OR “predict” OR “prediction” OR “model” or “nomogram” or “score” OR “scoring system”).
For degenerative spine studies: (“spine” OR “spinal” OR “vertebrae” OR “vertebral”) AND (“prediction model” OR “predictive model” OR “scoring system” OR “nomogram”) AND (“cervical” OR “lumbar”).
Inclusion Criteria
Articles included in the spine metastasis section of this study must 1) include the development or external validation of a clinical prediction model in the field of spine surgery, 2) be published after the year 2000, 3) include adult patients (≥ 18 years old) with spine metastasis, 4) include a survival endpoint of 1 year, and 5) include multiple types of primary histology in the development study.
Articles included in the degenerative spine section of this study must 1) include the development or external validation of a clinical prediction model in the field of spine surgery, 2) be published after the year 2000, 3) include adult patients (≥ 18 years old) who underwent surgery for degenerative spine conditions of the cervical or lumbar spine, and 4) include a primary outcome endpoint of 1 year.
A 1-year primary outcome endpoint was chosen to standardize the included models, as the studies included were meant to be representative, not comprehensive, models of the clinical prediction models in spine surgery. Furthermore, prediction models were included only if the AUC was reported for the 1-year outcome in the original study or an external validation study.
Development of Grading System
Each clinical prediction model included in this study was evaluated based on the design and sample size of the original study, whether it was a single institution or multi-institutional, the discrimination or AUC of the model at a 1-year endpoint, whether the model was internally validated (bootstrapping or training/validation method), whether the prediction model was externally validated, the weighted AUC from external validations, and how the model was deployed (i.e., web-based calculator, nomogram, scoring system, etc.). If a model included multiple outcomes with multiple AUC values, the outcome with the highest AUC was included in this study. The weighted AUC was calculated by multiplying each study’s external validation AUC by its sample size and summing them together, with this value then being divided by the total number of patients included in the validation studies. Points were assigned to each category, with the greatest amount of possible points being awarded to high discriminative performance in external validations (Table 1). Thresholds for these points were assigned through multiple rounds of expert discussion until consensus was reached and were largely based on the characteristics of the included studies (sample size, AUC cutoffs).
Proposed scoring system for the UPM score
| Predictive Model Characteristics | Points (16 total) |
|---|---|
| Original study sample size | |
| <150 patients | 0 |
| 150–500 patients | 1 |
| >500 patients | 2 |
| Original study population | |
| Single institution | 0 |
| Bi-institutional | 1 |
| Multi-institutional | 2 |
| Original study design | |
| Retrospective | 0 |
| Combined (retro- and prospective) | 1 |
| Prospective | 2 |
| Original study AUC | |
| <0.70 or not provided | 0 |
| 0.70–0.80 | 1 |
| >0.80 | 2 |
| Internal validation | |
| None | 0 |
| Bootstrapping or training/validation | 2 |
| Calibration assessment | |
| None | 0 |
| Calibration plot/Hosmer-Lemeshow test | 2 |
| Weighted external validation AUC | |
| <0.70 or no external validation | 0 |
| 0.70–0.80 | 2 |
| >0.80 | 3 |
| Clinical usability | |
| No WBC | 0 |
| WBC | 1 |
WBC = web-based calculator.
UPM scoring system: excellent = 12–16 points, good = 7–11 points, fair = 3–6 points, poor = 0–2 points.
Results
Included Studies
In the literature search of articles in the spine metastasis category, the search identified 795 unique articles (Fig. 1). Of these, 42 studies underwent full-text review and 16 studies were included in the final study. Reasons for exclusion in the full-text review included the lack of a 1-year time point (n = 16), the lack of a clinical prediction model (n = 7), the lack of reported AUC (n = 1), and the prediction model developed on only one type of tumor histology (n = 2). In the literature search of articles in the degenerative spine disease category, the search identified 1109 unique articles (Fig. 2). Of these, 34 studies underwent full-text review and 6 studies were included in the final study. Reasons for exclusion in the full-text review included the lack of a 1-year time point (n = 16), the lack of a clinical prediction model (n = 6), the lack of reported AUC (n = 1), and the study being abstract-only (n = 5).

PRISMA flow diagram for articles with spine metastasis prediction models with 1-year survival outcomes. Figure is available in color online only.

PRISMA flow diagram for articles with degenerative spine disease prediction models with 1-year outcomes after surgery. Figure is available in color online only.
UPM Scores: Metastatic Spine Disease
Of the 9 studies that provided 1-year survival findings in patients with metastatic spine disease (Table 2), 2 developed the model using bi-institutional data,10,21 5 studies used data from a single institution,1,4,8,9,25,26 and 2 studies used data from multiple institutions,7,27 including national registries. Five of the 9 studies used retrospective data,8,21,25,27 while 3 studies had a combined design (retrospectively assessing a prospective database),1,4,7 and 1 study used prospective data.9 The mean sample size was 375 ± 226 patients, and the mean internal AUC of the 4 original studies10,21,25,27 that reported this finding was 0.77 ± 0.06. Two of the studies10,21 internally validated the prediction model using the split training/validation cohort method, and 2 studies21,27 included an assessment of the models’ calibration. Of the 9 studies, all have been externally validated with a mean weighted external AUC of 0.74 ± 0.04. Only 1 prediction model included a web-based calculator.21 The models’ assigned utility of prediction model (UPM) scores ranged from 0 to 12 points (mean 6 points).
Prediction models for 1-year survival of patients with spinal metastasis
| Authors & Year | Single, Bi-, or Multi-institutional | Design | Sample Size | Internal AUC (1-yr survival) | Internal Validation | Calibration Assessment | External Validation | Weighted External AUC | WBC | UPM Score |
|---|---|---|---|---|---|---|---|---|---|---|
| Karhade et al., 201921 | Bi | Retro | 732 | 0.85 | Training/validation | Yes | Yes28 | 0.77 | Yes | 12 (excellent) |
| Paulino Pereira et al., 201610 | Bi | Retro | 649 | 0.77 | Training/validation | No | Yes11,29 | 0.78 | No | 8 (good) |
| Tokuhashi et al., 20054 | Single | Combined | 246 | None | Not performed | No | Yes25,29,38 | 0.71 | No | 4 (poor) |
| Morgen et al., 201825 | Single | Retro | 544 | 0.72 | Not performed | No | Yes38 | 0.76 | No | 5 (fair) |
| Tomita et al., 20011 | Single | Combined | 128 | None | Not performed | No | Yes11,29,38,39 | 0.71 | No | 3 (poor) |
| Leithner et al., 20088 | Single | Retro | 69 | None | Not performed | No | Yes11,27,29,38 | 0.67 | No | 0 (poor) |
| Katagiri et al., 20059 | Single | Pro | 350 | None | Not performed | No | Yes29 | 0.78 | No | 5 (fair) |
| van der Linden et al., 20057 | Multi | Combined | 342 | None | Not performed | No | Yes29,38 | 0.70 | No | 6 (fair) |
| Ghori et al., 201527 | Multi | Retro | 318 | 0.74 | Not performed | Yes | Yes11,23 | 0.77 | No | 8 (good) |
Pro = prospective; Retro = retrospective.
The spinal metastasis survival prediction model with the highest overall UPM score was that of Karhade et al.21 from the Skeleton Oncology Research Group (SORG), with a total of 12 points on the UPM scale. This predictive model was created using retrospective data from 2 institutions with a relatively high sample size (n = 732), and it achieved a high AUC (0.85), which was internally validated using the split training/validation cohort method. Furthermore, the authors assessed the calibration of the model in the original study and included a web-based calculator to simplify use of the model. This prediction model was later validated by the same group using an external cohort of 176 patients, which again found that the model showed relatively high performance (AUC 0.77).28
The prediction models achieving the next-highest UPM scores were those of Paulino Pereira et al.10 and Ghori et al.,27 both scoring 8 points. The model of Paulino Pereira et al. was developed using bi-institutional retrospective data on a relatively high sample size (n = 649). This model achieved a moderately high discrimination (AUC 0.77), which was internally validated using the split training/validation cohort method; however, model calibration was not assessed. This model has been externally validated by 2 studies,11,29 achieving a relatively high weighted external AUC of 0.78.
The prediction model of Ghori et al. was created on a cohort of 318 patients from 4 institutions.27 This model achieved a moderate discrimination (AUC 0.74). While this study did not include an internal validation, calibration was assessed using the Hosmer-Lemeshow goodness-of-fit test. This model has been externally validated by 2 studies,11,23 demonstrating a relatively high weighted external AUC of 0.77.
UPM Scores: Degenerative Spine Disease
Of the 6 studies that provided 1-year outcomes in patients with degenerative spine disease (Table 3), 2 studies30,31 used data from a single institution and 4 studies32–35 used data from multiple institutions. One of the 6 studies used retrospective data,30 while 1 study used prospective data32 and 4 studies31,33–35 retrospectively analyzed a prospective cohort (i.e., combined design). The overall mean sample size was 3314 ± 3044 patients, and the mean internal AUC of the 6 studies was 0.72 ± 0.06. Three of the 6 studies internally validated the models using the bootstrapping method,32,33,35 while 2 studies31,34 used the training/validation method. Only 1 of the 6 studies was externally validated,34 possibly due to the recency of the developed prediction models, with the earliest-included model being proposed in 2017.32 Two of the 6 models include a web-based calculator,32,34 and the models’ assigned UPM scores ranged from 0 to 14 points (mean 8 points).
Prediction models developed for 1-year outcomes after cervical or lumbar degenerative spine surgery
| Authors & Year | Single, Bi-, or Multi-institutional | Design | Sample Size | Internal AUC (1-year outcome) | Internal Validation | Calibration Assessed | External Validation | Weighted External AUC | WBC | UPM Score |
|---|---|---|---|---|---|---|---|---|---|---|
| De Silva et al., 202030 | Single | Retro | 64 | mJOA: 0.69 | Not performed | No | No | NA | No | 0 (poor) |
| McGirt et al., 201732 | Multi | Pro | 7618 | ODI: 0.69, EQ-5D: 0.69 | Bootstrapping | No | No | NA | Yes | 9 (good) |
| Rundell et al., 202033 | Multi | Combined | 5840 | PSI: 0.81* | Bootstrapping | Yes | No | NA | No | 11 (good) |
| Siccoli et al., 201931 | Single | Combined | 635 | NRS-BP: 0.75 | Training/validation | Yes | No | NA | No | 8 (good) |
| Khor et al., 201834 | Multi | Combined | 1583 | NRS-LP: 0.75 | Training/validation | Yes | Yes36 | Leg pain: 0.83 | Yes | 14 (excellent) |
| Asher et al., 201935 | Multi | Combined | 4148 | NASS satisfaction: 0.64 | Bootstrapping | No | No | NA | No | 7 (fair) |
mJOA = modified Japanese Orthopaedic Association; NA = not applicable; NASS = North American Spine Society.
Denotes this value reflects the highest AUC in the authors’ overall study, which was from the laminectomy group. The AUCs in the other two groups (microdiscectomy, and laminectomy with fusion) had lower AUCs and were therefore not listed in the table.
The degenerative spine prediction model with the highest overall UPM score was that of Khor et al., which scored 14 points, giving a grade of “excellent.”34 This study utilized data from 1583 patients within the multi-institutional Spine Surgical Care and Outcomes Assessment Program (SCOAP) registry to create a model to predict 1-year function using the Oswestry Disability Index (ODI) and back and leg pain using the numeric rating scale (NRS-BP and NRS-LP, respectively). This model predicted both NRS-LP and NRS-BP at 1 year with a relatively high discrimination (AUC 0.75) and was internally validated through the training/validation split cohort method. Calibration was also assessed using a calibration plot. This prediction model was then transformed into a freely available web-based calculator and externally validated, demonstrating an excellent external AUC of 0.83 for NRS-LP.36
The next highest scoring prediction models in the degenerative disease group were those of Rundell et al.33 and McGirt et al.,32 both scoring 11 and 9 points, respectively, and achieving a grade of “good.” Rundell et al. utilized retrospective data from the prospective Quality Outcomes Database (QOD) to create a model to predict disability based on ODI, back and leg pain based on the NRS, and the Patient Satisfaction Index (PSI).33 The overall study included a sample size of 5840 patients, and their model had the highest performance in predicting PSI in patients undergoing laminectomy without fusion (AUC 0.81), which was internally validated using the bootstrapping technique. Calibration for the model was also assessed using the optimism-corrected calibration slope method. A web-based calculator was not created for this prediction model, and the model has not yet been externally validated.33
McGirt et al. prospectively assessed the QOD registry to create a model to predict postoperative disability (ODI), EQ-5D quality-of-life scores, and pain severity (NRS) for 7618 patients undergoing elective lumbar spine surgery for degenerative disease.32 Model discrimination was found to be highest with 1-year ODI and EQ-5D scores (AUC 0.69 for both). The authors internally validated this model using the bootstrapping technique; however, calibration was not assessed. The authors also put forth a web-based calculator for the predictive model, but this model has not yet been externally validated.32
Discussion
This study compared prediction models in the metastatic spine disease and degenerative spine disease literature to extract model characteristics and propose an objective prediction model scoring scale, i.e., the UPM score. Of the 9 models in the spine metastasis literature, only 1 reached the grade of excellent,21 while 2 were graded as good,10,27 3 were graded as fair,7,9,25 and 3 were graded as poor.1,4,8 Of the 6 models in the degenerative spine literature, 1 study achieved the excellent grade,34 while 3 studies were graded as good,31–33 1 as fair,35 and 1 as poor.30
The establishment of a prediction model grading system serves three main purposes: 1) it provides a much needed tool to directly compare prediction models taking into account subsequent external validation results; 2) it places pressure on researchers to use high-quality patient cohorts and statistical methods when attempting to develop new prediction models; and 3) it emphasizes the need to demonstrate that a prediction model is able to perform well on external validation studies.
Since the early model published by Tokuhashi et al.2 in 1990 to predict the survival of metastatic spine patients, many new models have been proposed, in both the metastatic spine and degenerative spine surgery literature. Because of the large number of models, it has become increasingly difficult to determine the relative strengths and weaknesses and identify an optimal model to be used in clinical practice.37 Furthermore, the revised Tokuhashi4 and Tomita1 scores continue to be the most widely cited and utilized predictive tools,22 despite their relatively low discriminative abilities in external validation studies.11,25,29,38,39 This proposed scale allows clinicians and researchers to systematically evaluate the existing prediction models, as well as optimizing treatment planning and patient education.
Newer prediction models often take advantage of multi-institutional data, which facilitates analysis with larger and more heterogeneous patient cohorts.37 While models developed using multi-institutional data may have lower discriminations due to the wide variability of cohort data, these models often have greater generalizability when tested on outside cohorts.40 Therefore, a greater number of points were given to studies developed using multi-institutional data sets (2 points maximum) with larger cohorts (2 points maximum) that were prospectively gathered (2 points maximum).
In contrast to many of the older prediction models, more contemporary publications include internal validations of the models, by either bootstrapping or splitting the cohort into training and validation sets, as well as some with external validation. The inclusion of internal validations in contemporary studies provides the clinician with a greater degree of confidence when applying the models to patients outside of the original development cohorts.41 The proposed UPM score therefore highlights the importance of internal validation (2 points maximum), in addition to the quality of the cohort. For example, a model developed using a single-institution retrospective cohort of 100 patients that was not internally validated would earn the model 0 points, while a model developed using a multi-institutional prospective cohort of 1000 patients that was then internally validated would earn the model 8 points. By rewarding methodological robustness, the authors believe the UPM score will provide the necessary impetus to use both high-quality patient cohorts and rigorous statistical methods. This will also enable readers with less of a statistical background to better understand the differences between these models.
The performance of a prediction model can be separated into discrimination and calibration.42 Discrimination is the accuracy with which a model can predict a categorical/binary outcome (i.e., survival at 1 year, adequate symptom improvement at 1 year) and is often reported as a model’s AUC (or c-statistic/index). Calibration refers to the agreement between the model’s predicted risk and the actual observed risk and can be demonstrated by a calibration plot or through a Hosmer-Lemeshow test.42 To deem a prediction model worthy of further study, the authors believe it is essential that both measures of performance are provided, which is why the proposed UPM score awards points for greater internal AUC (3 points maximum) and the inclusion of a calibration assessment (2 points maximum). However, before a prediction model can be applied to actual patient care, it is imperative to further demonstrate the model’s performance on outside cohorts.43 In an analysis of 127 new prediction models, Siontis et al. found that only 25% had been externally validated, demonstrating only a 16% chance of being externally validated within 5 years of the original model proposal.44 To attest to the importance of external validations, the same study found that the validation studies had significantly worse AUC estimates when compared with the development study.44 For this reason, the proposed UPM score grants the greatest amount of points (4 points maximum) to the category of “weighted external validation AUC,” as the authors believe a model’s readiness for clinical application greatly depends on its external performance.
The last component of the proposed grading system is clinical usability. In recent years, there has been an impetus to transform scoring systems into web-based calculators that are freely available to providers.45 This eliminates the need to perform tedious calculations for each patient, and instead allows providers to simply enter patient data and receive the estimated risk of the desired outcome.45,46 Therefore, although inclusion of a web-based calculator does not add to the validity of a model, the proposed scale awards an additional point to prediction models that were converted into freely available web-based calculators to aid usability.
Similar Prediction Model Assessment Tools
Although the authors believe that the UPM score is the simplest measure to assess prediction models, similar systems have been previously established. One example is the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) Statement.47 This is a 22-item checklist that is used to guide authors when developing a prediction model and gives users a better glimpse into the process and quality of the data behind the model. Another tool used to assess risk of bias in prediction model studies is the Prediction Model Risk of Bias Assessment Tool (PROBAST).24 This domain-based tool allows users to assess both risk of bias and applicability of the proposed models, taking into account predictors and outcomes of each study. One example of the usefulness of this tool is the review of patient-centered outcome assessment tools by White et al.,48 which demonstrates that the PROBAST tool can effectively separate prediction model studies into high- and low-quality evidence categories. While these prediction model–specific tools are effective in making transparent reporting and high-quality studies a priority, they can only be applied to a prediction model’s original study or external validation study separately. In contrast, we believe the greatest strength of the proposed UPM score is the inclusion of the “weighted external validation AUC” component, as it allows a model’s external discriminatory performance to be weighted concurrently with the quality of the original study. As prediction models change in prognostic strength due to advances in treatment, the usefulness of the models will also change, and the UPM score can account for these changes by being consistently updated with data from future validation studies. However, as the TRIPOD and PROBAST tools are more extensive than the proposed UPM score, the authors believe these tools should be utilized concurrently to promote the highest-quality prediction models.
Limitations
One limitation to the current study is the fact that the point values for each category in the UPM score were assigned through expert consensus. This was done because the primary outcome measure for a prediction model is often its performance in external validation studies, and this itself was included as a category of the proposed scoring system. However, as this study stresses the importance of external validation, and external validation is not possible for the proposed scoring scale, this remains a limitation. Another limitation of this study is the inclusion of AUC as the primary measure of a model’s performance, rather than the model’s correlation with a continuous outcome. This led to the exclusion of studies that used linear rather than logistic regression to assess primary outcome. The authors believe, however, that inclusion of AUC allows the reader to grasp the simple question of, “What is the chance that this model will correctly predict the outcome?” In contrast, it is more difficult for a clinician to educate a patient that a model predicting their clinical outcome had a correlation of 0.65, for example. This is similar to the recent push to binarize studies that have continuous outcomes by including the proportion of patients achieving a minimal clinically important difference.49 It is also important to note that although the UPM focuses on AUC as the measure of a prediction model’s performance, other measures, such as interrater reliability, sensitivity, and specificity, are also important when choosing a prediction model. Furthermore, the original patient populations of the prediction models being compared often differ. For example, one degenerative spine model may be built using only patients undergoing lumbar fusion, while another includes patients undergoing laminectomy, and readers must be diligent in understanding these populations before utilizing a model. Along these lines, to simplify the proposed scoring system, several other important aspects of model development were not included. These include the authors’ approach to missing data and the method used to select/reduce variables, such as through a stepwise approach versus least absolute shrinkage and selection operator. Also, it must be noted that cutoffs for AUC and sample size were derived from expert consensus by taking the current studies into consideration. Therefore, these thresholds may not be appropriate when applied to outcomes that are very rare, even in a relatively large cohort. Lastly, a potential limitation of the proposed score is the emphasis it places on robust statistical analyses that have only recently become popular in the spine surgery literature, such as internal validation techniques and calibration assessment. Although this may be seen as emphasizing the more recently proposed models, the authors believe the merit of promoting statistical robustness outweighs the potential bias.
Conclusions
As the field continues to expand the role of predictive analytics in spine surgery, it is increasingly important to systematically analyze the relative strengths and weaknesses of proposed models. In this study, we created the UPM score, whereby one can grade a prediction model based on characteristics related to its initial development, its internal and external validity, and its usability. This score will hopefully serve as the first step toward better understanding of the many existing prediction models as they become increasingly used in patient care.
Disclosures
Dr. Buser reports being a consultant for Cerapedics, Xenco Medical (past), AO Spine (past), and Scripps Research; receiving clinical or research support for the study from SeaSpine (past), Next Science, and Motion Metrics; being a committee member for the North American Spine Society and the AOSNA Research Committee; being a co-chair of the research committee for the Lumbar Spine Society; and being an associate member of the AO Spine Knowledge Forum Degenerative. Dr. Wilson reports being a consultant for Stryker Canada and Bioventus. Dr. Sciubba reports being a consultant for Baxter, DePuy Synthes, Globus Medical, K2M, Medtronic, NuVasive, and Stryker, and unrelated grant support from Baxter Medical, the North American Spine Society, and Stryker.
Author Contributions
Conception and design: Sciubba, Ehresman, Lubelski, Ahmed. Acquisition of data: Ehresman, Lubelski, Pennington, Hung, Ahmed. Analysis and interpretation of data: Ehresman, Lubelski, Pennington, Hung, Ahmed. Drafting the article: Ehresman, Lubelski, Pennington, Azad, Feghali. Critically revising the article: Sciubba, Ehresman, Lubelski, Azad, Lehner, Feghali, Buser, Harrop, Wilson, Kurpad, Ghogawala. Reviewed submitted version of manuscript: Sciubba, Ehresman, Lehner, Buser, Harrop, Wilson, Kurpad, Ghogawala. Approved the final version of the manuscript on behalf of all authors: Sciubba. Statistical analysis: Ehresman.
References
- 1↑
Tomita K , Kawahara N , Kobayashi T , et al. Surgical strategy for spinal metastases . Spine (Phila Pa 1976) . 2001 ;26 (3 ):298 –306 .
- 2↑
Tokuhashi Y , Matsuzaki H , Toriyama S , et al. Scoring system for the preoperative evaluation of metastatic spine tumor prognosis . Spine (Phila Pa 1976) . 1990 ;15 (11 ):1110 –1113 .
- 3
Bauer HCF , Wedin R . Survival after surgery for spinal and extremity metastases. Prognostication in 241 patients . Acta Orthop Scand . 1995 ;66 (2 ):143 –146 .
- 4↑
Tokuhashi Y , Matsuzaki H , Oda H , et al. A revised scoring system for preoperative evaluation of metastatic spine tumor prognosis . Spine (Phila Pa 1976) . 2005 ;30 (19 ):2186 –2191 .
- 5
Sioutos PJ , Arbit E , Meshulam CF , Galicich JH . Spinal metastases from solid tumors. Analysis of factors affecting survival . Cancer . 1995 ;76 (8 ):1453 –1459 .
- 6
Rades D , Dunst J , Schild SE . The first score predicting overall survival in patients with metastatic spinal cord compression . Cancer . 2008 ;112 (1 ):157 –161 .
- 7↑
van der Linden YM , Dijkstra SPDS , Vonk EJA , et al. Prediction of survival in patients with metastases in the spinal column: results based on a randomized trial of radiotherapy . Cancer . 2005 ;103 (2 ):320 –328 .
- 8↑
Leithner A , Radl R , Gruber G , et al. Predictive value of seven preoperative prognostic scoring systems for spinal metastases . Eur Spine J . 2008 ;17 (11 ):1488 –1495 .
- 9↑
Katagiri H , Takahashi M , Wakai K , et al. Prognostic factors and a scoring system for patients with skeletal metastasis . J Bone Joint Surg Br . 2005 ;87 (5 ):698 –703 .
- 10↑
Paulino Pereira NR , Janssen SJ , van Dijk E , et al. Development of a prognostic survival algorithm for patients with metastatic spine disease . J Bone Joint Surg Am . 2016 ;98 (21 ):1767 –1776 .
- 11↑
Paulino Pereira NR , Mclaughlin L , Janssen SJ , et al. The SORG nomogram accurately predicts 3- and 12-months survival for operable spine metastatic disease: external validation . J Surg Oncol . 2017 ;115 (8 ):1019 –1027 .
- 12
Berger I , Piazza M , Sharma N , et al. Evaluation of the risk assessment and prediction tool for postoperative disposition needs after cervical spine surgery . Neurosurgery . 2019 ;85 (5 ):E902 –E909 .
- 13
Stopa BM , Robertson FC , Karhade AV , et al. Predicting nonroutine discharge after elective spine surgery: external validation of machine learning algorithms . J Neurosurg Spine . 2019 ;31 (5 ):742 –747 .
- 14
Goyal A , Ngufor C , Kerezoudis P , et al. Can machine learning algorithms accurately predict discharge to nonhome facility and early unplanned readmissions following spinal fusion? Analysis of a national surgical registry . J Neurosurg Spine . 2019 ;31 (4 ):568 –578 .
- 15↑
Ogink PT , Karhade AV , Thio QCBS , et al. Predicting discharge placement after elective surgery for lumbar spinal stenosis using machine learning methods . Eur Spine J . 2019 ;28 (6 ):1433 –1440 .
- 16↑
Devin CJ , Bydon M , Alvi MA , et al. A predictive model and nomogram for predicting return to work at 3 months after cervical spine surgery: an analysis from the Quality Outcomes Database . Neurosurg Focus . 2018 ;45 (5 ):E9 .
- 17↑
Asher AL , Devin CJ , Archer KR , et al. An analysis from the Quality Outcomes Database, Part 2. Predictive model for return to work after elective surgery for lumbar degenerative disease . J Neurosurg Spine . 2017 ;27 (4 ):370 –381 .
- 18↑
Song X , Mitnitski A , Cox J , Rockwood K . Comparison of machine learning techniques with classical statistical models in predicting health outcomes . Stud Health Technol Inform . 2004 ;107 (Pt 1 ):736 –740 .
- 19↑
Senders JT , Staples PC , Karhade AV , et al. Machine learning and neurosurgical outcome prediction: a systematic review . World Neurosurg . 2018 ;109 :476 –486.e1 .
- 20↑
Karhade AV , Thio QCBS , Ogink PT , et al. Development of machine learning algorithms for prediction of 30-day mortality after surgery for spinal metastasis . Neurosurgery . 2019 ;85 (1 ):E83 –E91 .
- 21↑
Karhade AV , Thio QCBS , Ogink PT , et al. Predicting 90-day and 1-year mortality in spinal metastatic disease: development and internal validation . Neurosurgery . 2019 ;85 (4 ):E671 –E681 .
- 22↑
Choi D , Pavlou M , Omar R , et al. A novel risk calculator to predict outcome after surgery for symptomatic spinal metastases; use of a large prospective patient database to personalise surgical management . Eur J Cancer . 2019 ;107 :28 –36 .
- 23↑
Goodwin CR , Schoenfeld AJ , Abu-Bonsrah NA , et al. Reliability of a spinal metastasis prognostic score to model 1-year survival . Spine J . 2016 ;16 (9 ):1102 –1108 .
- 24↑
Wolff RF , Moons KGM , Riley RD , et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies . Ann Intern Med . 2019 ;170 (1 ):51 –58 .
- 25↑
Morgen SS , Fruergaard S , Gehrchen M , et al. A revision of the Tokuhashi revised score improves the prognostic ability in patients with metastatic spinal cord compression . J Cancer Res Clin Oncol . 2018 ;144 (1 ):33 –38 .
- 26↑
Katagiri H , Okada R , Takagi T , et al. New prognostic factors and scoring system for patients with skeletal metastasis . Cancer Med . 2014 ;3 (5 ):1359 –1367 .
- 27↑
Ghori AK , Leonard DA , Schoenfeld AJ , et al. Modeling 1-year survival after surgery on the metastatic spine . Spine J . 2015 ;15 (11 ):2345 –2350 .
- 28↑
Karhade AV , Ahmed AK , Pennington Z , et al. External validation of the SORG 90-day and 1-year machine learning algorithms for survival in spinal metastatic disease . Spine J . 2020 ;20 (1 ):14 –21 .
- 29↑
Ahmed AK , Goodwin CR , Heravi A , et al. Predicting survival for metastatic spine disease: a comparison of nine scoring systems . Spine J . 2018 ;18 (10 ):1804 –1814 .
- 30↑
De Silva T , Vedula SS , Perdomo-Pantoja A , et al. SpineCloud: image analytics for predictive modeling of spine surgery outcomes . J Med Imaging (Bellingham) . 2020 ;7 (3 ):031502 .
- 31↑
Siccoli A , de Wispelaere MP , Schröder ML , Staartjes VE . Machine learning-based preoperative predictive analytics for lumbar spinal stenosis . Neurosurg Focus . 2019 ;46 (5 ):E5 .
- 32↑
McGirt MJ , Bydon M , Archer KR , et al. An analysis from the Quality Outcomes Database, Part 1. Disability, quality of life, and pain outcomes following lumbar spine surgery: predicting likely individual patient outcomes for shared decision-making . J Neurosurg Spine . 2017 ;27 (4 ):357 –369 .
- 33↑
Rundell SD , Pennings JS , Nian H , et al. Adding 3-month patient data improves prognostic models of 12-month disability, pain, and satisfaction after specific lumbar spine surgical procedures: development and validation of a prediction model . Spine J . 2020 ;20 (4 ):600 –613 .
- 34↑
Khor S , Lavallee D , Cizik AM , et al. Development and validation of a prediction model for pain and functional outcomes after lumbar spine surgery . JAMA Surg . 2018 ;153 (7 ):634 –642 .
- 35↑
Asher AL , Devin CJ , Kerezoudis P , et al. Predictors of patient satisfaction following 1- or 2-level anterior cervical discectomy and fusion: insights from the Quality Outcomes Database . J Neurosurg Spine . 2019 ;31 (6 ):835 –843 .
- 36↑
Quddusi A , Eversdijk HAJ , Klukowska AM , et al. External validation of a prediction model for pain and functional outcome after elective lumbar spinal fusion . Eur Spine J . 2020 ;29 (2 ):374 –383 .
- 37↑
Massaad E , Fatima N , Hadzipasic M , et al. Predictive analytics in spine oncology research: first steps, limitations, and future directions . Neurospine . 2019 ;16 (4 ):669 –677 .
- 38↑
Westermann L , Olivier AC , Samel C , et al. Analysis of seven prognostic scores in patients with surgically treated epidural metastatic spine disease . Acta Neurochir (Wien) . 2020 ;162 (1 ):109 –119 .
- 39↑
Liu Y , Yang M , Li B , et al. Development of a novel model for predicting survival of patients with spine metastasis from colorectal cancer . Eur Spine J . 2019 ;28 (6 ):1491 –1501 .
- 40↑
Singleton KW , Hsu W , Bui AAT . Comparing predictive models of glioblastoma multiforme built using multi-institutional and local data sources . AMIA Annu Symp Proc . 2012 ;2012 :1385 –1392 .
- 41↑
Steyerberg EW , Harrell FE Jr . Prediction models need appropriate internal, internal-external, and external validation . J Clin Epidemiol . 2016 ;69 :245 –247 .
- 42↑
Collins GS , de Groot JA , Dutton S , et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting . BMC Med Res Methodol . 2014 ;14 (1 ):40 .
- 43↑
Steyerberg EW , Moons KGM , van der Windt DA , et al. Prognosis research strategy (PROGRESS) 3: prognostic model research . PLoS Med . 2013 ;10 (2 ):e1001381 .
- 44↑
Siontis GCM , Tzoulaki I , Castaldi PJ , Ioannidis JPA . External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination . J Clin Epidemiol . 2015 ;68 (1 ):25 –34 .
- 45↑
Lee YH , Bang H , Kim DJ . How to establish clinical prediction models . Endocrinol Metab (Seoul) . 2016 ;31 (1 ):38 –44 .
- 46↑
Steyerberg EW , Vergouwe Y . Towards better clinical prediction models: seven steps for development and an ABCD for validation . Eur Heart J . 2014 ;35 (29 ):1925 –1931 .
- 47↑
Collins GS , Reitsma JB , Altman DG , Moons KGM . Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement . BMJ . 2015 ;350 (4 ):g7594 .
- 48↑
White HJ , Bradley J , Hadgis N , et al. Predicting patient-centered outcomes from spine surgery using risk assessment tools: a systematic review . Curr Rev Musculoskelet Med . 2020 ;13 (3 ):247 –263 .
- 49↑
Chung AS , Copay AG , Olmscheid N , et al. Minimum clinically important difference: current trends in the spine literature . Spine (Phila Pa 1976) . 2017 ;42 (14 ):1096 –1105 .


