Notes. N: Sample size; SD: Standard Deviation.
Next, using Equation 1, we standardise the data using the pooled SD of the outcome among participants at the post-treatment time point. First, we calculate both pooled sample SDs (baseline and post-treatment)
\begin{equation} \text{SD}_{\text{pooled}}=\ \sqrt{\frac{{\text{SD}_{t}}^{2}*\left(n_{t}-1\right)+\ {\text{SD}_{c}}^{2}*\left(n_{c}-1\right)\ }{n_{t}+\ n_{c}-2}}\nonumber \\ \end{equation}
Equation 2. Calculation of a pooled sample SD. The ‘t’ suffix indicates treatment, and the ‘c’ suffix refers to control arm.
Pooled sample SD at baseline time point:
\begin{equation} \text{SD}_{\text{pooled}}=\ \sqrt{\frac{{2.5}^{2}*\left(143-1\right)+\ {3.1}^{2}*\left(125-1\right)\ }{143+\ 125-2}}=2.796\nonumber \\ \end{equation}
Pooled sample SD at post-treatment time point:
\begin{equation} \text{SD}_{\text{pooled}}=\ \sqrt{\frac{{2.5}^{2}*\left(125-1\right)+\ {2.9}^{2}*\left(125-1\right)\ }{125+\ 125-2}}=2.707\nonumber \\ \end{equation}
Next, we convert arm-based data into contrast-based data (i.e., a single effect measure that summarises the MD between the two study-arms) using Equations 3 and 4.
\begin{equation} MD=\ \text{Mean\ score}_{\text{treatment}}-\text{Mean\ score}_{\text{control}}\nonumber \\ \end{equation}
Equation 3. MD computation at post-treatment time point.
\begin{equation} \text{SE}_{\text{MD}}=\ \sqrt{\frac{n_{t}+n_{c}\ }{n_{t}*n_{c}}*\frac{{\text{SD}_{t}}^{2}*\left(n_{t}-1\right)+\ {\text{SD}_{c}}^{2}*\left(n_{c}-1\right)\ }{n_{t}+n_{c}-2}}\nonumber \\ \end{equation}
Equation 4. Computation of the SE of the MD at post-treatment time point.
\begin{equation} MD=\ 3.2-3.8=\ -0.6\nonumber \\ \end{equation}\begin{equation} \text{SE}_{\text{MD}}=\sqrt{\frac{125+125\ }{125*125}*\frac{{2.5}^{2}*\left(125-1\right)+\ {2.9}^{2}*\left(125-1\right)\ }{125+\ 125-2}}=0.342\nonumber \\ \end{equation}
Then, we standardise our MD and SE dividing them by the corresponding pooled sample SDs. Methodologists support the use of pooled SDs at baseline over follow-up SDs, but it is common that studies only report follow-up data. Therefore, we are going to standardise data in both cases: (1) supposing that we have baseline data; and (2) supposing that we only have follow-up data.
Standardised data (SMD and SE) using the pooled sample SD at baseline time point.
\begin{equation} SMD=\ \frac{\text{MD}}{\text{SD}_{\text{pooled}}}=\frac{-0.6}{2.796}=-0.215\ \nonumber \\ \end{equation}\begin{equation} \text{SE}_{\text{SMD}}=\ \frac{\text{SE}}{\text{SD}_{\text{pooled}}}=\frac{0.342}{2.796}=0.122\ \nonumber \\ \end{equation}
Standardised data (SMD and SE) using the pooled sample SD at post-treatment time point.
\begin{equation} SMD=\ \frac{\text{MD}}{\text{SD}_{\text{pooled}}}=\frac{-0.6}{2.707}=-0.222\ \nonumber \\ \end{equation}\begin{equation} \text{SE}_{\text{SMD}}=\ \frac{\text{SE}}{\text{SD}_{\text{pooled}}}=\frac{0.342}{2.707}=0.126\ \nonumber \\ \end{equation}
Although this method is the most common applied in meta-analyses, the use of a fixed scale-specific SD reference is recommended [6,7]. A more-in-depth explanation of this method can be found in Gallardo-Gómez et al. (2023) [3] and the online content.
HOW TO INTERPRET STANDARDISED MEAN DIFFERENCES
The SMDs express the size of the treatment effect in each study relative to the variability observed in that study. However, the overall treatment effect could be difficult to interpret as it is reported in units of standard deviation rather than the original units of measurement. Without guidance, clinicians and patients may have little idea how to interpret results presented as SMDs. There are two possibilities for re-expressing such results in more helpful ways:
Re-expressing SMDs using rules of thumb for effect sizes. One example based on Cohen (1998) [8] is as follows: 0.2 represents a small effect; 0.5 a moderate effect: and 0.8 a large effect. Nevertheless, some methodologists believe that such interpretations are problematic because the importance of a finding is context-dependent and not amenable to generic statements [7].
Re-expressing SMDs using a familiar instrument . The second (and recommended) option is to re-express the SMD in the units of one or more of the specific measurement instruments. This method could be performed by multiplying the SMD by a typical among-person SD for a particular scale (e.g., an external SD reference from a large cohort or cross-sectional study that matches the target population, an internal SD reference, or a pooled sample SD), preferably, the same used for data standardisation [3]. In this way, using the original scale-specific units, the clinical relevance and impact of a pooled treatment effect can be interpreted more easily. In our example, when authors pooled all effect sizes, they obtained a pooled treatment effect of SMD = 0.40 (95% Confidence Interval 0.02 to 0.77). We then re-express this effect size into SPPB units multiplying by the external SD reference for the study population (external SD reference = 3.14), obtaining a scale-specific pooled effect of MD = 0.97 (95% CI 0.06 to 2.42). Considering a predefined minimally clinically important difference of 1 point in the SPPB [9], we could support the use of an intervention (physical activity in this case [4]) in a specific population due to its clinically meaningful benefit in the outcome of interest.
COMMON PITFALLS USING STANDARDISED MEAN DIFFERENCES
  1. Unnecessary data standardisation . Reviewers do not need to standardise their data when there are not different scales assessing the outcome of interest. The belief that the term ‘effect size’ is a synonym of ‘SMD’ can lead to authors reporting the treatment effect in SMD units when it is not needed. One example of this is when only one study is reported in a forest plot; an SMD is not needed, and this should be reported as MD.
  2. Use of SEs rather than SDs to calculate SMDs . As we have seen in the Equation 1, we use the post-treatment pooled sample SD to calculate SMDs. Nonetheless, primary studies could wrongly report the SE of an assessment as the SD or not specify whether they are reporting SD or SE. A red flag for this could be a quite low SD(i.e., <1), though it is highly dependent on the score range of the specific scale. This mistake could lead to ‘effect size inflation’ because when you use SEs to calculate SMDs, you are dividing the MD by a lower value of the truly corresponding one, obtaining a higher value. Therefore, if you obtain SMDs greater than one, you should check whether the SD or SE has been used.
  3. Combination of change from baseline and post-treatment effect measures . Although mixing change from baseline and post-treatment outcomes is not a problem when it comes to meta-analysis of MDs [7], they should not in principle be combined using SMDs. This is because the SDs used in standardising post-treatment values reflect between-person variability at a single time point, where SDs used in change scores standardization reflect variation in between-person changes over time, so will depend on both within-person (dependent on the length of time between measurements) and between-person variability [7].
  4. Effect size direction . There are scales where an improvement in the outcome is reflected by a reduction in the score (e.g., in our illustrative example, the less time spent in walking a distance, the better functional capacity). In addition, to interpret the magnitude of an effect, we must consider the specific outcome (e.g., a more negative effect could be positive if the review is investigating depressive symptoms, meaning a reduction in these symptoms). To correct an effect that is not in line with the direction of our meta-analysis, we should multiply the effect size value by –1, (no modifications are needed for the SD), ensuring that all effects are in the same direction.
  5. No interpretation of SMDs . A huge number of meta-analyses often leave their effect estimates as SMDs, which can make interpretation difficult. We have talked about different available options to re-express SMDs to more-interpretable estimates above.