Statistical analysis
The sample size calculation has previously been described in detail (19). At 29–34 weeks of gestations we expected a mean difference of 7.75 (SD=16) in WHO-5 between the intervention group and the control group. Based on a two-sample t-test and a two-sided significance level of 0.05, 91 women are required in each treatment group to obtain a power of 90%. A total of 300 women were planned to be randomised allowing for a 13% drop out as a result of discomfort or complications and a further 30% drop out due to refusal to fill in the questionnaires.
To account for missing values under the assumption of Missing At Random (MAR) and to adjust for potential baseline imbalances between the two treatment groups, quantitative outcomes were analysed using constrained linear mixed models considering scores measured baseline, 29–34 weeks of gestation and eight weeks postpartum as outcomes (24). The fixed part of the models included the interaction between group (intervention and control) and time (baseline/29–34 weeks/eight weeks postpartum) with the constraint that the means in the two groups were assumed equal at baseline due to randomisation. The random part of the model included a random intercept for each patient.
For each group and each time point, the proportion of women with EPDS ≥11 was estimated from a logistic regression model with parameters estimated by weighted Generalised Estimating Equations (GEE) to account for repeated measures and missing data (25). The weights were defined as the inverse probabilities of being observed conditional on previous measurements of EPDS (quantitative), treatment group and previous missing value of EPDS and were estimated from a logistic regression model. An unstructured correlation matrix was used as the working correlation.
As specified in the protocol (19), analysis of WHO-5 was repeated based on the sub group of women attending ≥75% of the sessions. In this analysis, the linear mixed model was not constrained to assume equal means at baseline as the randomisation is not valid for this group of women.
Due to the large number of hypotheses tested, correction for multiple testing was applied. The p-value corresponding to the comparison of the secondary outcome (WHO-5 at eight weeks postpartum) was adjusted accounting for the test of the primary outcome, there by multiplying the p-value by 2 (False Discovery Rate method ). The remaining secondary outcomes are presented uncorrected for multiple testning.