Competing Interests: MK, CK, IRS, MTS, and JCJ are compensated by NRx Pharmaceuticals, Inc. Lavin Statistical Associates is paid of independent statistical analysis by NRx Pharmaceuticals, Inc.
Introduction
Clinician-administered rating scales are a universal endpoint required
by regulators around the world for ascertainment of primary endpoint in
psychiatric clinical trials. Signal detection in multi-site trials
requires strong inter-rater reliability on these instruments; poor
inter-rater reliability is associated with increased error variance,
reduced study power2 and, ultimately, failed trials.
Poor inter-rater reliability, or unreliability, in psychometric rating
scales has many sources, including a lack of adherence to structured and
semi-structured interviews, rater scoring differences, and inconsistent
interview duration.3 Williams & Koback correctly
state “The importance of reliability of assessments in a clinical trial
cannot be overestimated. Without good interrater agreement the chances
of detecting a difference in effect between drug and placebo are
significantly reduced.”4 Commonly used methods for
establishing and maintaining strong inter-rater reliability include
site-rater training, external evaluation and monitoring of site-raters,
and centralized rating.
Monitoring of endpoint ascertainment in clinical trials is routinely
outsourced to Clinical Research Organizations (CROs) and to central
laboratories. While psychometric assessments are often monitored by
specialized CROs, this may not always be the best choice for a clinical
trial. The unique rigor required to ensure valid and reliable clinical
scale ratings means CROs must employ enough expert psychometricians who
are familiar both with the rating instruments and the unique aspects of
the disease and drug being studied. CRO raters must review site
assessments within a day of completion to ensure rater quality and
accuracy and provide remediation in a timely manner, if needed. Since
personnel turnover at CROs may be as high as 20% per
year5, outsourcing the day-to-day management of highly
nuanced psychometric ratings becomes impractical when there is turnover
and inter-rater variation among the “master raters.”
The Sponsor Rating Monitoring System (SRMS) was developed as a
pre-defined, protocol-specific, data-driven method to optimize
psychometric training, data validity and reliability in the context of a
clinical trial of a novel antidepressant targeting bipolar depression
with suicidality. In this system, the Sponsor employs expert raters with
extensive experience in conducting, analyzing, and training others in
the rating scales used to ascertain primary and secondary endpoints. In
SRMS, these master raters help the clinical operations team select
suitable clinical trial sites, document site rater qualifications,
oversee rater training and qualification, and confirm that all data
management conforms to the Study Protocol and GDP & GCP guidelines.
Most importantly, the Sponsor “master raters” review psychometric
assessments within 24 to 48 hours and provide corrective feedback, as
needed. This approach further allows for referral of an aberrant rating
to an adjudicating rater in real time, prior to data unblinding. The
centralized SRMS model does not transfer regulatory obligations to an
outside CRO or engage multiple data quality systems, which minimizes
oversight and subsequent audit responsibilities.
We examined the Inter-rater Reliability (IRR), i.e., the concordance
between site raters and Sponsor “master raters” on MADRS scores on
patients participating in the Phase 2b/3 clinical trial “NRX101 for
Suicidal Treatment Resistant Bipolar Depression” (ClinicalTrials.gov
Identifier: NCT03395392) to assess the potential efficacy of the SRMS.