The classical definition of statistical significance is p <= 0.05, meaning a 1/20 chance the test statistic found is due to normal variation of the null hypothesis. This definition of statistical significance does not represent the likelihood that the alternative hypothesis is true. Hypothesis testing can be evaluated using a 2x2 table (shown below). Box "a" = true positives: p <= 0.05 and the alternative hypothesis is true. This is the study's power. A rule of thumb is that study power should be at least 80% (80% of the time the statistical test is positive when the alternative hypothesis is true). Therefore a = 0.80. Box "b" = false-positives: p <= 0.05 but the alternative hypothesis is false. By definition, when p = 0.05 the test statistic has a 5% probability of occurring by chance when the null hypothesis is true. Therefore, b = 0.05. Box "c" = false-negatives: p >= 0.05 but the alternative hypothesis is true. This occurs 20% of the time when the study's power is 80%. Therefore, c = 0.20. Box "d" = true-negatives: p >= 0.05 and the null hypothesis is true. This occurs 95% of the time when p <= 0.05. Therefore, d = 0.95. From this table we derive: Sensitivity = power = a/(a+c) = 80%. Specificity = (1-p) = d/(b+d) = 95%. Positive predictive value = power/(power + p-value) = a/(a+b) = 94%. Negative predictive value = d/(c+d) = 83%. The classical definition of statistical significance is (1-specificity) and does not take power into consideration. The proposed new definition of statistical significance is when the positive predictive value of a test statistic is 95% or greater. To arrive at this, the cut-off p-value representing statistical significance needs to be corrected for study power so that 0.05 > (p-value)/(p-value + power). To achieve a 95% predictive confidence,it can be derived that statistical significance is a p-value <= power / 19.
Objectives: Statistical significance does not equal clinical significance. This study looked at how frequently statistically significant results in the nuclear medicine literature are clinically relevant. Methods: A medline search was performed with results limited to clinical trials or randomized controlled trials, published in one of the major nuclear medicine journals. Articles analyzed were limited to those reporting continuous variables where a mean (X) and standard deviation (SD) were reported and determined to be statistically significant (p < 0.05). A total of 32 test results were evaluated. Clinical relevance was determined in a two-step fashion. First, the crossover point between group 1 (normal) and group 2 (abnormal) was determined. This is the the point at which a variable is just as likely to fall in the normal distrubution as the abnormal distribution. Jacobson's test for clinically significant change was used: crossover point = (SD1 * X2 + SD2 * X1) / (SD1 + SD2). It was then determined how many SD's from the mean this crossover point fell. For example, 13.9 +/- 4.5 compared to 9.2 +/- 2.1 was reported as statistically significant (p < 0.05). The crossover point is 10.7, which equals 0.71 std from the mean: 13.9 - (0.71*4.5) = 9.2 + (0.71*2.1).   Results: The average crossover point was 0.66 SD's from the mean. The crossover point was within 1 SD from the mean in 26/32 cases, and in these cases averaged 0.45 SD. Thus, for 4 out of 5 statistically significant results, when applied to an individual patient, the cut-off between normal and abnormal was 0.45 SD from the mean. This results in a third of normal patients falling into an abnormal category. Conclusions: Statistically significant results frequently are not clinically significant. Statistical significance alone does not ensure clinical relevance.