Comparison between raters – agreement measures
The six raters were asked to assess movement on a 5-point scale, ranging
from no movement to fully mobile. Inter-rater specific agreement was
less than 60% for 4 of the 5 categories; immobile, slightly reduced
movement, minimal residual mobility and paresis. The only category to
have high inter-rater specific agreement of 83.04%, was the fully
mobile category. This may simply be because this is what clinicians see
most commonly when performing flexible nasendoscopy - a fully mobile
vocal cord - with the high agreement being a reflection of pattern
recognition. Furthermore, since the dataset was formed from routine
clinical cases, about 70% of them are of fully mobile vocal cords.
Therefore, due to the high prevalence, the positive predictive value of
the clinicians for this score category would be
high11. Furthermore, when assessing each individual
rating in the five-point scale the combined agreement measure in each
category varied considerably, ranging from only 16.6% for score 1
(minimal movement) to 83% for score 4 (fully mobile). This significant
range in agreement highlights the difficulty in assessment of vocal cord
mobility. When the options are limited to three categories, there was
improved inter-rater specific agreement, with fully mobile agreement at
96.11%, and no mobility at 75.11%.
Analysis of specific agreement scores provides an insight into the
categories the consultants were in greater agreement and the reason for
the improvement of the scores with the 3-category scale. Clearly, much
of the variability in scoring between clinicians is in the categories 1
(minimal residual mobility), 2 (paresis) and 3 (slightly reduced
mobility) in the 5-category scale. The agreement in these categories for
any session was less than 31%.