2. Methods
In this study we focus on the exact binomial test (a.k.a. the Clopper and Pearson test) (Chow et al., 2008; Clopper & Pearson, 1934). For a short description of the test, let X denote a variable from a binomial (n, p ) distribution and x its observed value. Letn be the sample size andbn,p (x ) = P (x ) andBn,p (x ) = P (x ) denote the probability mass function and the cumulative distribution function of the binomial distribution with parameters p andn . Assume that based on the observation of x we test forH 0p 0 againstHap  ≠ p 0(two-tailed test) orHap  < p 0orHap  > p 0(one-tailed tests). The alpha level critical region of the test, calculated by inverting the Clopper-Pearson exact confidence interval, is as follows:
\(\{x:\ \sum_{i=0}^{x}{b_{n,p}\left(i\right)\ \leq\alpha\ }\}\).(left-tailed test)\(\{x:\ \sum_{i=x}^{n}{b_{n,p}\left(i\right)\ \leq\alpha\ }\}\)(right-tailed test)\(\left\{x:\ \sum_{i=0}^{x}{b_{n,p}\left(i\right)\ \leq\alpha/2\ }\right\}\cup\{x:\sum_{i=x}^{n}{b_{n,p}\left(i\right)\ \leq\alpha/2\}\ }\)(two-tailed test)
If the outcome is subject to misclassification with known sensitivity and specificity, the so-called Rogan and Gladen formula can be applied to calculate the true proportion from the observed one (Rogan & Gladen, 1978). The formula for this adjustment looks like
padj = (pobs + Sp– 1) / ( Se + Sp – 1)
where Se and Sp denote the sensitivity and specificity of the diagnostic test and padj andpobs denote the adjusted and observed proportions.
Reiczigel et al (Reiczigel et al., 2010) showed that applying the Rogan & Gladen formula to the endpoints of a confidence interval for the sample proportion results in a valid confidence interval for the true proportion. Furthermore, the adjustment preserves exactness of the CI. These properties of the CIs have similar implications on testing.
We carried out the investigations setting the alpha error rate to 5% and power to 80%. For some selected null proportionsp 0 in H 0 and assumed true proportions pa (Table 1) we determined the necessary sample size n by exact power calculation. For eachn we calculated the power so that we determined the alpha-level critical region C of the test and calculated the probability ofC assuming a binomial distribution withp =pa .
We calculated sample sizes for sensitivity and specificity values 1, 0.99, 0.98, 0.95, 0.90. As we suspected that increase in necessary sample size may differ for the two one-tailed tests (even for the two-tailed test depending on whether pa is located left or right from p 0), we investigated each one separately. Thus, we set up two pa for each p 0: one left and the other right fromp 0 (see Table 1). These were selected so that the sample size in case of no misclassification takes a few hundreds. We did not include p 0 values above 0.5 because results for p 0>0.5 are mirror-images of those for p 0< 0.5. For example, power of test for p 0=0.9 withpa =0.96, Se =0.99, and Sp =.95 is same as that for p 0=0.1 withpa =0.04, Se =0.95, and Sp =.99.
It is known that the power of the binomial test does not depend monotonically on sample size but displays a saw-tooth pattern (Chernick & Liu, 2002), thus, it may occur that for some n the power is above 80% but for a greater sample size it falls again under 80%. An example of this is shown in Figure 1.