Discussion

The high IRR observed in this trial, specifically 93.4%, supports the utility of “in-sourced” psychometric review. This result, to our knowledge, provides first evidence that this method is practical and implementable with complex psychiatric patients with bipolar depression and subacute suicidal ideation or behavior. This result also replicates and extends the findings of Targum and Catania (2019), who examined concordance between site and site-independent raters using digital audio recording of 3,736 MADRS interviews. They report concordance rates between 89.5% and 95.8% with lower concordance occurring during earlier visits and higher concordance occurring at later visits. The average concordance across all visits was 93.3%. In a separate paper, Targum et al (2014) report a concordance rate of 93.8% between site and site-independent raters, however the discordance cutoff score was 6 or more points on the total MADRS score.15 However, Targum and Catania defined discordance as a deviation of greater than 6 points on the MADRS, which was equal to one standard deviation of the mean total MADRS score.14
In contrast to Targum, the SRMS method used a more rigorous definition of 3 points to achieve a similar concordance rate of 93.4%. If the SRMS criteria were relaxed to define 6 points as the discordant cutoff, only 3 discordant pairs would occur out of 133 assessments yielding a 97.7% IRR rate. Importantly, we did not include Screening visits in this analysis; screening visit MADRS data are used to confirm participant inclusion by Study Protocol, not IRR scores. However, Targum and Catania report the highest discordance rates in Screening Visits (11.5%). The ICC was also very high, consistent with or exceeding results published in similar studies.16
Targum and Catania reported that for MADRS scores equal to or greater than 30, site raters tend to assign higher (more severe scores) than site-independent raters.14 The converse is true for MADRS scores less than 20. Our results confirm this finding. In most (6 out of 7) of our discordant pairs, the site rater score was higher than the Sponsor rater, and in all but one of the pairs, the Sponsor rater-assigned MADRS score was greater than 30. Additionally, when examining interview length, Targum and Catania noted that MADRS interviews less than or equal to 12 minutes were associated with significantly higher rates of scoring discordance.
The clinical trial from which these concordance data were collected comprised 12 sites with a target enrollment of 72 participants, which is the typical size of a phase 2 trial. Thus, the SRMS system is likely applicable to most Phase 2 and smaller Phase 3 psychiatric trials. The SRMS method with its focus on real time review approach worked well with a limited number (n=12) of experienced clinical trial sites, where participants were interviewed in their primary language. Experience in one site where patients who were not primarily English-speaking were interviewed in English resulted in ratings that were technically uninterpretable and lower than allowable IRR. The clinical trial sites were selected, among other things, for rater experience, particularly with respect to MADRS administration. High concordance rates are likely due, at least in part, to working with experienced site raters (minimum 5 years of experience) who were willing to engage in initial and ongoing training during the trial, if needed. It remains to be seen whether this approach can scale to larger trials with is unclear how well this approach would scale for larger trials. A study with more sites and a larger number of participants would likely require more than the three Master Raters to ensure 100% assessment review at all sites. At the same time, if the SRMS approach results in reduced variance associated with the primary and secondary study endpoints, study sample sizes can be reduced without sacrificing study power.