Estimating Standard Error from other summary statistics (P, LSD, MSD)

When conducting a meta-analysis that includes previously published data, differences between treatments reported with P-values, least significant differences (LSD), and other statistics provide no direct estimate of the variance.

In the context of the statistical meta-analysis models that we use, overestimates of variance are okay, because this effectively reduces the weight of a study in the overall analysis relative to an exact estimate, but provides more information than either excluding the study or excluding any estimate of uncertainty (though there are limits to this assumption such as ...).

Where available, direct estimates of variance are preferred, including Standard Error (SE), sample Standard Deviation (SD), or Mean Squared Error (MSE). SE is usually presented in the format of mean (±SE). MSE is usually presented in a table. When extracting SE or SD from a figure, measure from the mean to the upper or lower bound. This is different than confidence intervals and range statistics (described below), for which the entire range is collected.

If MSE, SD, or SE are not provided, it is possible that LSD, MSD, HSD, or CI will be provided. These are range statistics and the most frequently found range statistics include a Confidence Interval (95%CI), Fisher’s Least Significant Difference (LSD), Tukey’s Honestly Significant Difference (HSD), and Minimum Significant Difference (MSD). Fundamentally, these methods calculate a range that indicates whether two means are different or not, and this range uses different approaches to penalize multiple comparisons. The important point is that these are ranges and that we record the entire range.

Another type of statistic is a “test statistic”; most frequently there will be an F-value that can be useful, but this should not be recorded if MSE is available. Only if there is no other information available should you record the P-value.

Solutions

Below is a list of transformations to $$SE$$ where $$SE=\sqrt{MSE/n}$$ (Saville 2003) that I am considering, feedback appreciated; below, I assume that $$\alpha=0.05$$ so $$1-^{\alpha}/_2=0.975$$ and variables are normally distributed unless otherwise stated:

• given $$P$$, $$n$$, and treatment means $$\bar X_1$$ and $$\bar X_2$$

$$SE=\frac{\bar X_1-\bar X_2}{t_{(1-\frac{P}{2},2n-2)}\sqrt{2/n}}$$

• given LSD (Rosenberg 2004), $$\alpha$$, $$n$$, $$b$$ where $$b$$ is number of blocks, and $$n=b$$ by default for RCBD

$$SE = \frac{LSD}{t_{(0.975,n)}\sqrt{2bn}}$$

• given MSD (minimum significant difference) (Wang 2000), $$n$$, $$\alpha$$, df = $$2n-2$$

$$SE = \frac{MSD}{t_{(0.975, 2n-2)}\sqrt{2}}$$

• given a 95% Confidence Interval (Saville 2003) (measured from mean to upper or lower confidence limit), $$\alpha$$, and $$n$$

$$SE = \frac{CI}{t_{(\alpha/2,n)}}$$

• given Tukey's HSD, $$n$$, where $$q$$ is the 'studentized range statistic',

$$SE = \frac{HSD}{q_{(0.975,n)}}$$

A summary of equations that can be to convert statistics from $$P$$, $$LSD$$, or $$MSD$$ to $$SE$$

These are used in the Predictive Ecosystem Analyzer (PEcAn) ecosystem modeling workflow software (LeBauer 2013). Many statistical transformations are implemented in the transformstats function within the PEcAn.utils package. This function is described in more detail below. However, transformations that require detailed knowledge of the experimental design have not yet been automated within PEcAn (and the BETYdb is not designed to handle all of the information required to get the most precise estimate of uncertainty).

From To Conversion R code Notes
P SE (x1-x2)/(qt(1-P/2,2*n-2)*sqrt(2/n)) $$\bar{X}_{1,2}$$ are two means being compared.
LSD SE $$SE = \frac{LSD}{t_{1-\alpha/2,n}*\sqrt{2b}}$$ LSD/(qt(1-P/2,n)*sqrt(2*b)) where $$b$$ is the number of blocks, $$n$$ is the number of replicates, and $$n=b$$ in a Randomized Complete Block Design
MSD SE $$SE = \frac{MSD*n}{t_{1-\alpha, 2n-2}*\sqrt{2}}$$ msd*n/(qt(1-P/2,2*n-2)*sqrt(2))

See related questions on Stats.SE 1 and 2

Examples

Calculating $$MSE$$ given $$F$$, $$df_{\text{group}}$$, and $$SS$$

Given:

$$F = MS_g/MS_e$$)

Where $$g$$ indicates the group, or treatment. Rearranging this equation gives: $$MS_e=MS_g/F$$

Given

$$MS_x = SS_x/df_x$$

Substitute $$MS_e/df_e$$ for $$SS_e$$ in the first equation

$$F=\frac{SS_g/df_g}{MS_e}$$

Then solve for $$MS_e$$

$$MS_e = \frac{SS_g}{df_g\times F}$$

$$df_{\text{total}}=(df_a+1)\times(df_b+1)...\times(n)-1$$

Which depends on the experimental design:

For factors a, b... (usually 1 or 2, sometimes 3) where $$n$$ is the number of replicates within each treatment combination.

• One-way anova $$df_{\text{total}}=an-1$$; where $$a$$ is the number of treatments
• Two-way anova without replication $$df_{\text{total}}=(a+1)(b+1)-1$$ also known as ’’randomized complete block design’’ (RCBD)
• Two-way anova with $$n$$ replicates $$df_{\text{total}}=(a+1)(b+1)(n)-1$$ aka ’’RCBD with replication’’

Application

As an example, we will use the first ANOVA from Table 3 in (Starr 2008). The results are from one (two?) factor ANOVA with repeated measures, with treatment and week as the factors and no replication.

We will calculate MSE from the $$SS_{\text{treatment}}$$ $$df_{\text{treatment}}$$, and $$F$$-value given in the table; these are $$109.58$$, $$2$$, and $$0.570$$, respectively; $$df_{\text{weeks}}$$ is given as $$10$$.

For the 1997 Eriphorium vaginatum, the mean $$A_{max}$$ in table 4 is $$13.49$$.

Calculate $$MS_e$$:

$$MS_e = \frac{109.58}{0.57 \times 2} = 96.12$$