The effect size measure
The calculation of the small‐sample bias corrected log response ratio
and Hedges’ d both rely on the SD values of the control and
treatment group. Imputing missing SDs thus affects both, effect sizes
and effect size weights. For the simple log response ratio and Fisher’sz the imputation of missing SDs and/or SSs only affects effect
size weights.
The type of missing data :
Our simulations show that missing SSs could/should routinely be imputed,
albeit with caution in case a correlation between effect sizes and
sample sizes in the Fisher’s z dataset. Some studies might not
report their actual SSs but rather give some indication on the lower or
upper boundary (e.g. if an unknown number of samples were excluded from
the presented analyses). Such information can be used to curtail the
range of imputed values, as can be done within the following imputation
methods: Linear regression, predictive mean matching, classification and
regression trees, random forest, Bayes predictive mean matching and
bootstrap expectation maximization.
For the log response ratio and Hedges’ d the treatment of missing
SDs will have a stronger effect on the grand mean and its confidence
interval than the treatment of missing SSs. What we did not investigate
with our simulations is the effect of the range and distribution of SDs
and/or SSs. Larger ranges and non-uniform distributions of SDs and/or
SSs might likely result in higher variability of imputed values and thus
larger confidence intervals. Meta-analyses that summarize findings from
different study designs; e.g. across observational and experimental
studies or across different organism groups; could harbour exceeding and
uneven distributions of SDs and/or SSs that we did not simulate in for
this study.
The mechanism leading to the observed pattern of missingness :
Following our simulation results, data that is missing completely at
random (MCAR) or missing at random (MAR) could/should routinely be
imputed. For Hedges’ d , data that is not missing at random (MNAR)
introduced deviation in the grand mean (in comparison to a fully
informed weighted meta-analysis), regardless of the option to treat such
missing data. Imputation via bootstrap expectation maximization might
yield a weaker deviation in grand means, but the applied algorithm
frequently failed if more than 60% of SDs and/or SSs were missing.
Manually fine-tuning of the respective algorithm parameters might
increase its succession rate.
Relationships between effect sizes and SDs :
Imputation methods that applied a predictive model, i.e. except of mean,
median and random sample value imputations, could account for a
relationship between effect sizes and effect sizes precision. In case of
such a relationship, those algorithms that used predictive mean matching
tended to yield grand means that were most similar to the results from
fully informed weighted analyses. In case of correlated effect sizes and
SSs in the Fisher’s z dataset, the imputation of missing data via
mean, median, random sample and non-parametric random forest imputation
introduced a stronger deviation of the grand mean than the omission of
those incompletely reported studies.