Discussion
The high IRR observed in this trial, specifically 93.4%, supports the
utility of “in-sourced” psychometric review. This result, to our
knowledge, provides first evidence that this method is practical and
implementable with complex psychiatric patients with bipolar depression
and subacute suicidal ideation or behavior. This result also replicates
and extends the findings of Targum and Catania (2019), who examined
concordance between site and site-independent raters using digital audio
recording of 3,736 MADRS interviews. They report concordance rates
between 89.5% and 95.8% with lower concordance occurring during
earlier visits and higher concordance occurring at later visits. The
average concordance across all visits was 93.3%. In a separate paper,
Targum et al (2014) report a concordance rate of 93.8% between site and
site-independent raters, however the discordance cutoff score was 6 or
more points on the total MADRS score.15 However,
Targum and Catania defined discordance as a deviation of greater than 6
points on the MADRS, which was equal to one standard deviation of the
mean total MADRS score.14
In contrast to Targum, the SRMS method used a more rigorous definition
of 3 points to achieve a similar concordance rate of 93.4%. If the SRMS
criteria were relaxed to define 6 points as the discordant cutoff, only
3 discordant pairs would occur out of 133 assessments yielding a 97.7%
IRR rate. Importantly, we did not include Screening visits in this
analysis; screening visit MADRS data are used to confirm participant
inclusion by Study Protocol, not IRR scores. However, Targum and Catania
report the highest discordance rates in Screening Visits (11.5%). The
ICC was also very high, consistent with or exceeding results published
in similar studies.16
Targum and Catania reported that for MADRS scores equal to or greater
than 30, site raters tend to assign higher (more severe scores) than
site-independent raters.14 The converse is true for
MADRS scores less than 20. Our results confirm this finding. In most (6
out of 7) of our discordant pairs, the site rater score was higher than
the Sponsor rater, and in all but one of the pairs, the Sponsor
rater-assigned MADRS score was greater than 30. Additionally, when
examining interview length, Targum and Catania noted that MADRS
interviews less than or equal to 12 minutes were associated with
significantly higher rates of scoring discordance.
The clinical trial from which these concordance data were collected
comprised 12 sites with a target enrollment of 72 participants, which is
the typical size of a phase 2 trial. Thus, the SRMS system is likely
applicable to most Phase 2 and smaller Phase 3 psychiatric trials. The
SRMS method with its focus on real time review approach worked well with
a limited number (n=12) of experienced clinical trial sites, where
participants were interviewed in their primary language. Experience in
one site where patients who were not primarily English-speaking were
interviewed in English resulted in ratings that were technically
uninterpretable and lower than allowable IRR. The clinical trial sites
were selected, among other things, for rater experience, particularly
with respect to MADRS administration. High concordance rates are likely
due, at least in part, to working with experienced site raters (minimum
5 years of experience) who were willing to engage in initial and ongoing
training during the trial, if needed. It remains to be seen whether this
approach can scale to larger trials with is unclear how well this
approach would scale for larger trials. A study with more sites and a
larger number of participants would likely require more than the three
Master Raters to ensure 100% assessment review at all sites. At the
same time, if the SRMS approach results in reduced variance associated
with the primary and secondary study endpoints, study sample sizes can
be reduced without sacrificing study power.