1. Introduction
One-sample inference for binary data is one of the most common task in epidemiology and medical statistics (Bland, 2015). One-sided testing is used more often than two-sided one, among others when assessing freedom from disease or approving diagnostic tests, and in industrial quality control, evaluation of medical devices, and in clinical trials of rare diseases (Cameron & Baldock, 1998; Cheng & Zhen, 2021; Feld et al., 2015; Khan, Sarker, & Hackshaw, 2012; Lu, Li, & Xu, 2020). The left-sided alternative (H 0p 0 againstHap  < p 0) is applied for example in proving freedom from disease, while the right-sided one (H 0p 0 againstHap  > p 0) is used if one wants to prove that a particular exposure increases the probability of getting a disease. There exist several different tests, exact as well as asymptotic, for all testing scenarios. Sample size calculation is available for each method, it needs the prescribed alpha and power, the null proportion p 0, and the assumed true proportion pa for which the prescribed power should be reached (Chow, Shao, & Wang, 2008; Suresh & Chandrashekara, 2012).
In many cases, the outcome may be wrongly classified. When the outcome is disease status and a diagnostic test is applied, the two usual measures of test quality are sensitivity, the proportion of correct diagnosis given the subject has the disease, and specificity, the proportion of correct diagnosis given the subject does not have the disease (Yerushalmy, 1947). Usually, a diagnostic test has less than 100% sensitivity and specificity which must be accounted for in both the design and analysis of a study. There are analysis methods accounting for misclassification (Lang & Reiczigel, 2014; Reiczigel, Földi, & Ózsvári, 2010; Hársfalvi & Reiczigel, 2023) but the sample size needed for the same power is higher than it would be without misclassification. Thus, ignoring the possibility of misclassification in the sample size calculation may result in an underpowered, inconclusive study, causing considerable financial loss and raising ethical concerns.
Most books on sample size calculation do not mention misclassification at all. Others have a short note declaring this as a problem advised to account for but none of them have clear instructions for researchers. (Chow et al., 2008; Julious, 2009; Kieser, 2020; Ryan, 2013). Intuition may suggest that if probability of misclassification is as low as a few percent in both directions, the increase in sample size is ignorable, but this is not true. To show this, we develop sample size calculation for the one-sample proportion test under misclassification and investigate how the necessary sample size depends on the sensitivity, specificity, and effect size.