Abstract
Potential misclassification of a binary outcome measure is often ignored
in study design, causing considerable loss of power, and threatening the
quality of research. Although there exist studies taking
misclassification into account in data analysis, we argue that it should
be accounted for already in sample size calculation. We illustrate this
by comparing sample sizes needed with and without misclassification in
case of the binomial test. Our sample size procedure, implemented as an
R function, calculates exact power, and accounts for non-monotonicity of
power as a function of sample size, and for potential drop-out or lack
of data in the study. The necessary sample size is computed from the
null proportion p 0, the assumed true proportionpa , and the probabilities of correct
classification, sensitivity (Se) and specificity (Sp) . Our
results show that misclassification may drastically affect the necessary
sample size. For p 0<0.5, the effect of
specificity is stronger than that of sensitivity, whereas forp 0>.5 it is the other way round.
Effects are strongest when p 0 is near 0 or 1,
especially for one-sided tests with pa located
farther from 0.5 than the null value p 0. For
example, even with Se = Sp = 99%,p 0 = 0.01, and left-sided alternative, sample
size is more than fourfold of that without misclassification (3-fold ifp 0=0.02; 1.4-fold ifp 0=0.05).