Abstract
Potential misclassification of a binary outcome measure is often ignored
in study design, causing considerable loss of power, and threatening the
quality of research. Although there exist studies taking
misclassification into account in data analysis, we argue that it should
be accounted for already in sample size calculation. We illustrate this
by comparing sample sizes needed with and without misclassification in
case of the binomial test. Our sample size procedure, implemented as an
R function, calculates exact power, and accounts for non-monotonicity of
power as a function of sample size, and for potential drop-out or lack
of data in the study. The necessary sample size is computed from the
null proportion p 0, the assumed true proportion
pa, and the probabilities of correct
classification, sensitivity ( Se) and specificity ( Sp).
Our results show that misclassification may drastically affect the
necessary sample size. For p 0<0.5, the
effect of specificity is stronger than that of sensitivity, whereas for
p 0>.5 it is the other way round.
Effects are strongest when p 0 is near 0 or 1,
especially for one-sided tests with pa located
farther from 0.5 than the null value p 0. For
example, even with Se = Sp = 99%, p
0 = 0.01, and left-sided alternative, sample size is
more than fourfold of that without misclassification (3-fold if p
0=0.02; 1.4-fold if p 0=0.05).