Reliability measures
The consistency of discriminating vocal cord motion between the
consultants (inter-) and between sessions for a given consultant
(intra-) are provided in Table 4. Kappa values are consistent across
sessions and the reported inter-rater reliability is the mean
reliability of all sessions. Discriminating vocal cord motion using the
5-category scale is less reliable (κ = 0.52) than with the 3-category
scale (κ = 0.68).
The intra-rater or test-retest reliability is the mean reliability of
each consultant over the three sessions. With the 5-category scale,
intra-rater reliability ranged from 0.55 (fair) to 0.82 (excellent),
with a mean of 0.69. The kappa values increased with the 3-category
scale and ranged from 0.64 to 0.87, with a mean kappa of 0.75. Two out
of six consultants had excellent reliability (0.78 and 0.82) with the
5-category scale and three consultants had excellent reliability with
the 3-category scale (0.78, 0.87 and 0.87).