Measurement of Concordance and Congruence
All measurements of Concordance and Congruence as defined below were
ascertained in the context of a triple-blinded study in which neither
participants, treating physicians, or raters were aware of treatment
group assignment. Moreover, the site raters were further blinded as to
the clinical chart of the study participant, previous rating scores,
compliance with study drug, or any other clinical characteristics.
Sponsor master raters had no participant contact or clinical information
and assigned rating scores solely from audio files of site rating
sessions.
Three master raters were responsible for independently evaluating MADRS
interviews completed by site raters. At a clinical trial visit, a total
MADRS score was obtained from the site rater and the Sponsor rater to
create a pair of ratings. If the site-rater assigned MADRS score was
within 3 points higher or lower than the Sponsor-rater assigned score,
it was deemed concordant. If the pair of MADRS scores differed by four
points or more, it was considered discordant. This 3-point measure of
Concordance is a stricter standard than has recently been advocated by
others (see discussion).
“Congruence” or Inter-rater reliability (IRR) was defined as the
percent of sampled rating that were concordant. A Congruence or IRR
standard of 90% was established by the sponsor for ongoing
participation of a study site in the clinical trial. If the site rater
and sponsor rater manager’s review did not meet the above criteria for
IRR, the reviewer contacted the site rater for a consultation on the
interview and scores. This consultation provided an opportunity for the
resolution of discrepancies and, potentially, site rater training or
remediation. Additionally, the sponsor rater manager may contact a site
rater to discuss any remediation triggers, specifically observed
interviews that led to concerns over scale administration, e.g., lack of
adherence to the structured interview guide, numerous leading questions,
unusually brief interview duration, etc. If a lack of agreement or other
issues with scale administration were identified, the SRMS worked with
the rater to remediate performance.
The Congruence or Inter-Rater Reliability (IRR) was calculated as the
total number of subjects in concordance divided by the number of
subjects assessed multiplied by 100. Intraclass correlation was
calculated (VassarStats) to assess the absolute correlation between the
raters within the same patient population.13 A
within-subjects ANOVA was used to determine type 1 error. The mean of
the absolute difference between site and Sponsor raters was calculated
along with a 95% confidence interval.
If rating congruence could not be brought within the required 3 points,
the protocol directed referral of the rating to an external adjudicating
rater who was trained and standardized with the sponsor master raters on
a common rating training set.