Simulation of missing SDs and/or SSs in meta-analysis data sets
We assessed the effects of 14 options to treat increasing proportions of
missing SDs and/or SSs on the grand mean and the corresponding
confidence interval.
Data-generating mechanism: We created two types of meta-analysis
data sets. The first dataset was created to calculate effect sizes that
summarize mean differences between control and treatment groups. The
second dataset was created to analyse effect sizes that summarize mean
correlation coefficients. Each dataset consisted of 100 rows
representing 100 hypothetical studies with separate means, SDs and SSs
for the control and treatment group (for the mean difference data sets)
and separated correlation coefficients and SSs (for the correlation
coefficient data sets). To reduce random noise and obtain more stable
results, we created ten separate mean difference data sets and ten
separate correlation coefficient data sets. Mean difference data sets
were created with the following data-generating mechanisms. Mean values
for the control groups were randomly drawn from a truncated normal
distribution with mean = 1, SD = 0.25 and lower limit = 0.001. Mean
values for the treatment groups were randomly drawn from a truncated
normal distribution with mean = 2, SD = 0.5 and lower limit = 0.001. SD
values for the control groups were randomly drawn from a truncated
normal distribution with mean = 0.25, SD = 0.125, lower limit = 0.01 and
upper limit = 1. SD values for the treatment groups were randomly drawn
from a truncated normal distribution with mean = 0.5, SD = 0.25, lower
limit = 0.01 and upper limit = 1. SS values for the control and the
treatment groups were both drawn from a truncated Poisson distribution
with lambda = 10 and lower limit = 5. Correlation coefficient data sets
were created with the following data-generating mechanisms. Correlation
coefficient values were drawn from a truncated normal distribution with
mean = 0.5, SD = 0.125, lower limit = -1 and upper limit = 1. SS values
were drawn from a truncated Poisson distribution with lambda = 10 and
lower limit = 5.
In all data sets, we simulated missing data by either randomly or
non-randomly deleting between 10% and 90% of the SDs, SSs or both in
the mean difference data sets and between 10% and 90% of the SSs in
the correlation coefficient data sets (in steps of 5%). Within each
dataset row, we thereby deleted the SDs in both, the control and
treatment group and we independently deleted the SSs in both, the
control and treatment group. With these deletions, we constructed the
following four deletion/correlation scenarios, visualized in Supplement
S3 (Supporting Information):
- SDs and/or SSs were deleted completely at random (MCAR ,m issing c ompletely a t r andom) and
there were no correlations in the data sets.
- The chance of deleting SDs and/or SSs increased with decreasing effect
size values (MAR , m issing a tr andom). All effect sizes were ranked in decreasing order and
the chance of deletion linearly increased with the rank position of
the effect sizes. No further correlations were introduced.
- The chance of deleting SDs and/or SSs increased with increasing SDs
and decreasing SSs (MNAR , m issing n ota t r andom). We ranked the summed SDs
(sdt + sdc ) in
increasing order (corresponding to a lower precision) and ranked the
summed SSs (nt + nc ) in
decreasing order (corresponding to a lower sample size). The chance of
deletion linearly increased with the rank position of the summed SD
and SS values. Effect sizes with a lower precision or sample size
thereby had a higher change of their SDs and SSs being deleted. No
further correlations were introduced.
- Effect size values were paired with effect size precision (i.e. sorted
so that larger effect sizes had smaller SDs and larger SSs). SDs
and/or SSs were m issing c ompletely a tr andom (corMCAR ). This hypothetical scenario might
happen in meta-analyses across different study designs that impact
both the obtained effect size and its precision (e.g. due to the
different possibilities to account for additional drivers of effect
sizes in experimental versus observational studies).
In total, we created 2,560 data sets: 4 deletion/correlation scenarios,
4 types of deleted data (SDs, SSs, or both for mean difference data sets
and only SSs for correlation coefficient data sets), 10 randomly
generated data sets and 16 deletion steps (10% - 90% of values
deleted).
Handling of missing data: To each of the 2,560 data sets, we
separately applied one of the outlined 14 options to handle missing SDs,
and/or SSs in meta-analysis data sets (Table 1). For the
sample-size-weighted meta-analysis, we assigned approximate variance
measures to each effect size, according to eqn 1. Our general workflow
to fill missing values via multiple imputations is illustrated in Figure
2. We generally restricted imputed SDs to range between 0.01 and 1 and
imputed SSs to be ≥ 5. Those restrictions were applied to prevent
implausible (e.g. negative) imputations and guarantee convergence of
subsequent linear mixed-effects models. Data were imputed in the
following order: SDs of the treatment group, SDs of the control group,
SS of the treatment group and SSs of the control group. Changing this
imputation sequence had virtually no effect on the results. For the
bootstrap expectation maximization imputation, we only imputed data sets
with up to 60% of missing values because the algorithm frequently
crashed above this threshold. Similar to White et al. and Ellington et
al., we repeated all imputation methods 100 times (thus “multiple
imputations”) to obtain 100 imputed data sets.
Effect sizes: After applying the outlined 14 options to handle
missing SDs and/or SSs (Table 1), we calculated the three most prominent
effect size measures in ecological meta-analyses together with their
respective variance estimates where possible/necessary. With the mean
difference data sets, we calculated the small‐sample bias-corrected log
response ratio (hereafter log response ratio) and Hedges’ d . With
the correlation coefficient data sets, we calculated Fisher’s z(see Supplement S3, Supporting Information for the equations applied).
Grand mean estimates: For every data set (including complete,
unweighted, approximately weighted and imputed data sets), we calculated
the grand mean effect size and its corresponding approximated 95%
confidence interval with a linear intercept-only mixed-effects model.
Thereby, the effect size from each dataset row was treated with a random
effect and weighted by the inverse of its corresponding or approximated
variance estimate (rma function in the metafor package).
For every imputation method and every percentage of missing SDs and/or
SSs, the resulting 100 grand mean and 95% confidence interval estiamtes
were averaged under consideration of the uncertainty that arose from the
multiple imputations (using Rubin’s Rules as implemented in themi.meld function of the Amelia package).
Performance measures: We evaluated the effects of the different
options to handle missing SDs and/SSs in terms of the obtained grand
mean and the width of the corresponding 95% confidence interval against
reference values obtained with a weighted meta-analysis on the complete
data sets (hereafter fully informed weighted meta-analysis). Deviation
in the grand mean was quantified as the obtained grand mean estimate
minus the estimate from the fully informed weighted analysis. Deviation
in the confidence interval was quantified as the obtained width of the
confidence interval minus the width from a fully informed weighted
analysis. We then graphically summarized the trends in the grand mean
and confidence interval from using different options to handle
increasing proportions of missing SDs and/or SSs. We refrained from
using performance measures, such as the root-mean-square error, to
compare the different options to handle missing data because we aimed at
demonstrating general and non-linear trends. Since some of the
imputation models failed to converge above a threshold of ca. 60% of
missing data this would render performance measures infeasible above
this threshold.
All analyses were conducted in R using ggplot2 for
graphical representations. The R -scripts used to simulate the
data sets, delete and impute missing SDs and/or SSs are available at
github.com/StephanKambach/SimulateMissingDataInMeta-Analyses. Script
number three can be used to quickly compare the effects of the 14
options to treat missing SDs and/or SSs on the grand mean of any
supplied meta-analysis dataset that should be summarized with the log
response ratio, Hedges’ d or Fisher’s z .