1. Introduction
One-sample inference for binary data is one of the most common task in
epidemiology and medical statistics (Bland, 2015). One-sided testing is
used more often than two-sided one, among others when assessing freedom
from disease or approving diagnostic tests, and in industrial quality
control, evaluation of medical devices, and in clinical trials of rare
diseases (Cameron & Baldock, 1998; Cheng & Zhen, 2021; Feld et al.,
2015; Khan, Sarker, & Hackshaw, 2012; Lu, Li, & Xu, 2020). The
left-sided alternative
(H 0: p = p 0 againstHa : p < p 0)
is applied for example in proving freedom from disease, while the
right-sided one
(H 0: p = p 0 againstHa : p > p 0)
is used if one wants to prove that a particular exposure increases the
probability of getting a disease. There exist several different tests,
exact as well as asymptotic, for all testing scenarios. Sample size
calculation is available for each method, it needs the prescribed alpha
and power, the null proportion p 0, and the
assumed true proportion pa for which the
prescribed power should be reached (Chow, Shao, & Wang, 2008; Suresh &
Chandrashekara, 2012).
In many cases, the outcome may be wrongly classified. When the outcome
is disease status and a diagnostic test is applied, the two usual
measures of test quality are
sensitivity, the proportion of
correct diagnosis given the subject has the disease, and specificity,
the proportion of correct diagnosis given the subject does not have the
disease (Yerushalmy, 1947). Usually, a diagnostic test has less than
100% sensitivity and specificity which must be accounted for in both
the design and analysis of a study. There are analysis methods
accounting for misclassification (Lang & Reiczigel, 2014; Reiczigel,
Földi, & Ózsvári, 2010; Hársfalvi & Reiczigel, 2023) but the sample
size needed for the same power is higher than it would be without
misclassification. Thus, ignoring the possibility of misclassification
in the sample size calculation may result in an underpowered,
inconclusive study, causing considerable financial loss and raising
ethical concerns.
Most books on sample size calculation do not mention misclassification
at all. Others have a short note declaring this as a problem advised to
account for but none of them have clear instructions for researchers.
(Chow et al., 2008; Julious, 2009; Kieser, 2020; Ryan, 2013). Intuition
may suggest that if probability of misclassification is as low as a few
percent in both directions, the increase in sample size is ignorable,
but this is not true. To show this, we develop sample size calculation
for the one-sample proportion test under misclassification and
investigate how the necessary sample size depends on the sensitivity,
specificity, and effect size.