Letter to the Editor. Inter-rater agreement: a methodological issue

View More View Less
  • 1 Imam Ali Heart Hospital, Kermanshah University of Medical Sciences, Kermanshah, I.R. Iran; and
  • 2 School of Allied Medical Sciences, Kermanshah University of Medical Sciences, Kermanshah, I.R. Iran
Full access

If the inline PDF is not rendering correctly, you can download the PDF file here.

TO THE EDITOR: We read with interest the article by Rinaldo et al.4 (Rinaldo L, Johnson DM, Vine RL, et al: Differences between patient- and professional-reported modified Rankin Scale score in patients with unruptured aneurysms. J Neurosurg [epub ahead of print August 10, 2018. DOI: 10.3171/2018.3.JNS18247]). The aim of the authors was to compare patient- and nurse-reported modified Rankin Scale (mRS) scores in a consecutive series of patients with unruptured intracranial aneurysm (UIA). A nurse assigned an mRS grade based on their assessment of each patient’s degree of disability. Patients, who were blinded to the nurses’ assessment, were subsequently instructed to assign themselves an mRS score with the aid of a standard form providing definitions of the mRS scores. Agreement between patients’ and nurses’ scores was interpreted using a kappa statistic. According to their results, the kappa coefficient for agreement between patient- and nurse-reported mRS scores was 0.58 (95% CI 0.49–0.67).4

Using a kappa coefficient to assess agreement of a qualitative variable is not always appropriate and is not reliable in some circumstances. The kappa value is appropriate for nominal variables and also dependent on the number of categories. In such a situation, our suggestion for estimating unbiased results is to apply a weighted or Fleiss kappa.1–3 Tables 1 and 2 show that kappa (0.43) and weighted kappa (0.63) have different values in assessing agreement in the above situation. In this instance, the authors should interpret their results with caution.

TABLE 1.

Grade assignments by two raters

Rater 1
Rater 2Grade 2Grade 3Sum 
Grade 16020181
Grade 2212418
Grade 33111125
Sum654316124
TABLE 2.

Kappa and weighted kappa values for calculating agreement between two raters for more than two categories

StatisticEstimateLowerUpper
Kappa0.430.310.55
Weighted kappa0.630.520.74

Disclosures

The authors report no conflict of interest.

References

  • 1

    Gisev N, Bell JS, Chen TF: Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm 9:330338, 2013

    • Search Google Scholar
    • Export Citation
  • 2

    Naderi M: Letter to the editor on “Is regular knee radiograph reliable enough to assess the knee prosthesis position?” J Arthroplasty [epub ahead of print], 2018 (Letter)

    • Search Google Scholar
    • Export Citation
  • 3

    Naderi M, Sabour S: Methodological issues on reliability of interpretation of neurologic examination findings for the localization of vestibular dysfunction in dogs. J Am Vet Med Assoc 252:14601462, 2018

    • Search Google Scholar
    • Export Citation
  • 4

    Rinaldo L, Johnson DM, Vine RL, Rabinstein AA, Lanzino G: Differences between patient-and professional-reported modified Rankin Scale score in patients with unruptured aneurysms. J Neurosurg [epub ahead of print August 10, 2018. DOI: 10.3171/2018.3.JNS18247]

    • Search Google Scholar
    • Export Citation
View More View Less
  • Mayo Clinic, Rochester, MN
Keywords:

Response

We would like to thank Shahsavari et al. for their thorough review and critique of our paper. We hope to address their concerns regarding the statistical methodology of our work with the following. Shahsavari et al. point out that when assessing inter-rater agreement of a non-binary categorical variable, the use of a non-weighted kappa statistic can result in biased results. Our understanding is that the weighted kappa is most helpful in accounting for the magnitude of difference between individual rater scores. For example, there is clearly more disagreement between mRS scores 0 and 4 than between scores 1 and 2. In our data set, there was only a single instance in which the patient- and nurse-reported mRS score differed by more than one category. While our numerical results may have differed had a weighted kappa been applied, we suspect that these differences would not have changed our relatively conservative interpretation that there is a moderate amount of disagreement between patient- and nurse-reported mRS scores in this patient population. Moreover, we do not think the readers’ criticism is relevant to the main finding of our study—specifically that certain patient characteristics were associated with greater patient- than nurse-reported mRS scores. Nevertheless, Shahsavari et al.’s point is well taken, and we will consider their remarks in future works. We sincerely thank them for their interest in our paper.

If the inline PDF is not rendering correctly, you can download the PDF file here.

Contributor Notes

Correspondence Soodeh Shahsavari: s_shahsavari@kums.ac.ir.

INCLUDE WHEN CITING Published online November 2, 2018; DOI: 10.3171/2018.8.JNS182356.

Disclosures The authors report no conflict of interest.

  • 1

    Gisev N, Bell JS, Chen TF: Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm 9:330338, 2013

    • Search Google Scholar
    • Export Citation
  • 2

    Naderi M: Letter to the editor on “Is regular knee radiograph reliable enough to assess the knee prosthesis position?” J Arthroplasty [epub ahead of print], 2018 (Letter)

    • Search Google Scholar
    • Export Citation
  • 3

    Naderi M, Sabour S: Methodological issues on reliability of interpretation of neurologic examination findings for the localization of vestibular dysfunction in dogs. J Am Vet Med Assoc 252:14601462, 2018

    • Search Google Scholar
    • Export Citation
  • 4

    Rinaldo L, Johnson DM, Vine RL, Rabinstein AA, Lanzino G: Differences between patient-and professional-reported modified Rankin Scale score in patients with unruptured aneurysms. J Neurosurg [epub ahead of print August 10, 2018. DOI: 10.3171/2018.3.JNS18247]

    • Search Google Scholar
    • Export Citation

Metrics