Measurement of Concordance and Congruence
All measurements of Concordance and Congruence as defined below were ascertained in the context of a triple-blinded study in which neither participants, treating physicians, or raters were aware of treatment group assignment. Moreover, the site raters were further blinded as to the clinical chart of the study participant, previous rating scores, compliance with study drug, or any other clinical characteristics. Sponsor master raters had no participant contact or clinical information and assigned rating scores solely from audio files of site rating sessions.
Three master raters were responsible for independently evaluating MADRS interviews completed by site raters. At a clinical trial visit, a total MADRS score was obtained from the site rater and the Sponsor rater to create a pair of ratings. If the site-rater assigned MADRS score was within 3 points higher or lower than the Sponsor-rater assigned score, it was deemed concordant. If the pair of MADRS scores differed by four points or more, it was considered discordant. This 3-point measure of Concordance is a stricter standard than has recently been advocated by others (see discussion).
“Congruence” or Inter-rater reliability (IRR) was defined as the percent of sampled rating that were concordant. A Congruence or IRR standard of 90% was established by the sponsor for ongoing participation of a study site in the clinical trial. If the site rater and sponsor rater manager’s review did not meet the above criteria for IRR, the reviewer contacted the site rater for a consultation on the interview and scores. This consultation provided an opportunity for the resolution of discrepancies and, potentially, site rater training or remediation. Additionally, the sponsor rater manager may contact a site rater to discuss any remediation triggers, specifically observed interviews that led to concerns over scale administration, e.g., lack of adherence to the structured interview guide, numerous leading questions, unusually brief interview duration, etc. If a lack of agreement or other issues with scale administration were identified, the SRMS worked with the rater to remediate performance.
The Congruence or Inter-Rater Reliability (IRR) was calculated as the total number of subjects in concordance divided by the number of subjects assessed multiplied by 100. Intraclass correlation was calculated (VassarStats) to assess the absolute correlation between the raters within the same patient population.13 A within-subjects ANOVA was used to determine type 1 error. The mean of the absolute difference between site and Sponsor raters was calculated along with a 95% confidence interval.
If rating congruence could not be brought within the required 3 points, the protocol directed referral of the rating to an external adjudicating rater who was trained and standardized with the sponsor master raters on a common rating training set.