Comparison between raters – agreement measures
The six raters were asked to assess movement on a 5-point scale, ranging from no movement to fully mobile. Inter-rater specific agreement was less than 60% for 4 of the 5 categories; immobile, slightly reduced movement, minimal residual mobility and paresis. The only category to have high inter-rater specific agreement of 83.04%, was the fully mobile category. This may simply be because this is what clinicians see most commonly when performing flexible nasendoscopy - a fully mobile vocal cord - with the high agreement being a reflection of pattern recognition. Furthermore, since the dataset was formed from routine clinical cases, about 70% of them are of fully mobile vocal cords. Therefore, due to the high prevalence, the positive predictive value of the clinicians for this score category would be high11. Furthermore, when assessing each individual rating in the five-point scale the combined agreement measure in each category varied considerably, ranging from only 16.6% for score 1 (minimal movement) to 83% for score 4 (fully mobile). This significant range in agreement highlights the difficulty in assessment of vocal cord mobility. When the options are limited to three categories, there was improved inter-rater specific agreement, with fully mobile agreement at 96.11%, and no mobility at 75.11%.
Analysis of specific agreement scores provides an insight into the categories the consultants were in greater agreement and the reason for the improvement of the scores with the 3-category scale. Clearly, much of the variability in scoring between clinicians is in the categories 1 (minimal residual mobility), 2 (paresis) and 3 (slightly reduced mobility) in the 5-category scale. The agreement in these categories for any session was less than 31%.