2. Methods
In this study we focus on the exact binomial test (a.k.a. the Clopper
and Pearson test) (Chow et al., 2008; Clopper & Pearson, 1934). For a
short description of the test, let X denote a variable from a
binomial (n, p ) distribution and x its observed value. Letn be the sample size andbn,p (x ) = P (X = x )
andBn,p (x ) = P (X < x )
denote the probability mass function and the cumulative distribution
function of the binomial distribution with parameters p andn . Assume that based on the observation of x we test forH 0: p = p 0 againstHa : p ≠ p 0(two-tailed test) orHa : p < p 0orHa : p > p 0(one-tailed tests). The alpha level critical region of the test,
calculated by inverting the Clopper-Pearson exact confidence interval,
is as follows:
\(\{x:\ \sum_{i=0}^{x}{b_{n,p}\left(i\right)\ \leq\alpha\ }\}\).(left-tailed test)\(\{x:\ \sum_{i=x}^{n}{b_{n,p}\left(i\right)\ \leq\alpha\ }\}\)(right-tailed test)\(\left\{x:\ \sum_{i=0}^{x}{b_{n,p}\left(i\right)\ \leq\alpha/2\ }\right\}\cup\{x:\sum_{i=x}^{n}{b_{n,p}\left(i\right)\ \leq\alpha/2\}\ }\)(two-tailed test)
If the outcome is subject to misclassification with known sensitivity
and specificity, the so-called Rogan and Gladen formula can be applied
to calculate the true proportion from the observed one (Rogan & Gladen,
1978). The formula for this adjustment looks like
padj = (pobs + Sp– 1) / ( Se + Sp – 1)
where Se and Sp denote the sensitivity and specificity of the
diagnostic test and padj andpobs denote the adjusted and observed
proportions.
Reiczigel et al (Reiczigel et al., 2010) showed that applying the Rogan
& Gladen formula to the endpoints of a confidence interval for the
sample proportion results in a valid confidence interval for the true
proportion. Furthermore, the adjustment preserves exactness of the CI.
These properties of the CIs have similar implications on testing.
We carried out the investigations setting the alpha error rate to 5%
and power to 80%. For some selected null proportionsp 0 in H 0 and assumed true
proportions pa (Table 1) we determined the
necessary sample size n by exact power calculation. For eachn we calculated the power so that we determined the alpha-level
critical region C of the test and calculated the probability ofC assuming a binomial distribution withp =pa .
We calculated sample sizes for sensitivity and specificity values 1,
0.99, 0.98, 0.95, 0.90. As we suspected that increase in necessary
sample size may differ for the two one-tailed tests (even for the
two-tailed test depending on whether pa is
located left or right from p 0), we investigated
each one separately. Thus, we set up two pa for
each p 0: one left and the other right fromp 0 (see Table 1). These were selected so that the
sample size in case of no misclassification takes a few hundreds. We did
not include p 0 values above 0.5 because results
for p 0>0.5 are mirror-images of
those for p 0< 0.5. For example, power
of test for p 0=0.9 withpa =0.96, Se =0.99, and Sp =.95 is
same as that for p 0=0.1 withpa =0.04, Se =0.95, and Sp =.99.
It is known that the power of the binomial test does not depend
monotonically on sample size but displays a saw-tooth pattern (Chernick
& Liu, 2002), thus, it may occur that for some n the power is
above 80% but for a greater sample size it falls again under 80%. An
example of this is shown in Figure 1.