Letter to the Editor. Class imbalance in machine learning for neurosurgical outcome prediction: are our models valid?

View More View Less
  • Bergman Clinics, Amsterdam, the Netherlands
Full access

If the inline PDF is not rendering correctly, you can download the PDF file here.

TO THE EDITOR: The article by Scheer and colleagues3 on predicting major complications in adult spinal deformity surgery was very much appreciated (Scheer JK, Smith JS, Schwab F, et al: Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine 26:736–743, June 2017).

Dr. Scheer and colleagues trained an accurate machine learning (ML) model to predict complications using a range of preoperatively available patient and surgical features. We applaud the sound methodology that was implemented. Using multiple bootstrapped decision trees, the authors trained a highly effective predictive model that achieved an area under the curve (AUC) of 0.89 and accuracy of 87% at internal validation. However, no sensitivity or specificity was reported. We believe that, due to the rigorous methodology that Dr. Scheer and colleagues applied, the reported AUC and accuracy probably give a valid impression of their powerful ML model. However, we would like to stress the general importance of considering class imbalance in this context.

Class imbalance is present when one class, the minority class, is much rarer than the majority class. ML models extract features better and are most robust if all classes are approximately equally distributed. If considerable class imbalance is present, ML models will often become “lazy” in learning how to discriminate between classes and instead choose to simply vote for the majority class. This bias provides synthetically high AUC, accuracy, and specificity but unemployable sensitivity. The “accuracy paradox” denotes the situation when synthetically high accuracy only reflects the underlying class distribution in unbalanced data.

As an example, one might want to predict complications from a registry containing 90% of patients without complications. By largely voting for the majority class (no complication), the model would achieve an accuracy and specificity of around 90% and very low sensitivity without actually learning from the data. This can be countered by adjusting class weights within the model, by undersampling and thus removing observations from the majority class, or by oversampling the minority class.1 Specifically, the synthetic minority oversampling technique (SMOTE) has been validated, shows robust performance, and is easy to employ.2 SMOTE simulates new observations for the minority class by using k-means clustering.

Neurosurgical data are often prone to class imbalance. With the emergence of many studies that aim to predict neurosurgical outcomes using ML, it is crucial to ensure methodological quality. The study by Scheer et al. showed only moderate class imbalance and probably represents a valid predictive model. In general, if class imbalance is present, care should be taken to weight classes or to under- or oversample using data science techniques like SMOTE. Accuracy and AUC alone do not always give a full representation of an ML model’s performance. In our view, additionally reporting the sensitivity and specificity is central.

Disclosures

The authors report no conflicts of interest.

References

  • 1

    Batista GEAPA, Prati RC, Monard MC: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:2029, 2004

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 2

    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP: SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321357, 2002

  • 3

    Scheer JK, Smith JS, Schwab F, Lafage V, Shaffrey CI, Bess S, : Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine 26:736743, 2017

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation

Response

No response was received from the authors of the original article.

If the inline PDF is not rendering correctly, you can download the PDF file here.

Contributor Notes

Correspondence Victor E. Staartjes: victor.staartjes@gmail.com.

INCLUDE WHEN CITING Published online August 17, 2018; DOI: 10.3171/2018.5.SPINE18543.

Disclosures The authors report no conflicts of interest.

  • 1

    Batista GEAPA, Prati RC, Monard MC: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:2029, 2004

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 2

    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP: SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321357, 2002

  • 3

    Scheer JK, Smith JS, Schwab F, Lafage V, Shaffrey CI, Bess S, : Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine 26:736743, 2017

    • Crossref
    • PubMed
    • Search Google Scholar
    • Export Citation

Metrics

All Time Past Year Past 30 Days
Abstract Views 312 0 0
Full Text Views 354 187 19
PDF Downloads 169 48 4
EPUB Downloads 0 0 0