Reliability measures
The consistency of discriminating vocal cord motion between the consultants (inter-) and between sessions for a given consultant (intra-) are provided in Table 4. Kappa values are consistent across sessions and the reported inter-rater reliability is the mean reliability of all sessions. Discriminating vocal cord motion using the 5-category scale is less reliable (κ = 0.52) than with the 3-category scale (κ = 0.68).
The intra-rater or test-retest reliability is the mean reliability of each consultant over the three sessions. With the 5-category scale, intra-rater reliability ranged from 0.55 (fair) to 0.82 (excellent), with a mean of 0.69. The kappa values increased with the 3-category scale and ranged from 0.64 to 0.87, with a mean kappa of 0.75. Two out of six consultants had excellent reliability (0.78 and 0.82) with the 5-category scale and three consultants had excellent reliability with the 3-category scale (0.78, 0.87 and 0.87).