Statistical methods
Due to the observational nature of the data, propensity score methods were chosen to estimate the treatment effect of EA. Prior to any inferential analysis, 10-fold multiple imputation was performed to handle missing values. For each multiply imputed dataset, a propensity score was calculated for receiving EA. Propensity score calculation was based on factors potentially influencing the decision of obstetric caretakers whether or not to use EA, and were maternal age, weight, height, gestation week, foetal position, gender, year, hospital category, length and head circumference of the foetus. Predicted probabilities of receiving EA from these models were used as propensity scores in all further analyses. Results from the multiply imputed datasets were combined using Rubin’s rules as implemented in the R package mice.16 For every endpoint, only cases that had no missing values in this endpoint prior to imputation were used. Linear regression models were used for the continuous endpoints pH, BE and APGAR scores after 1, 5 and 10 minutes. Logistic regression models were used for admission to NICU, perinatal mortality, and AS5<7. The covariates in every model were the propensity scores and EA (yes/no).
Since two primary objectives were investigated, Bonferroni correction for multiple testing was applied with a significance level of 0.05/2 = 0.025. Furthermore, 97.5% confidence intervals for the effect of EA were reported. Since p‑values for secondary objectives served only descriptive purposes, no multiple testing corrections were applied and 95% confidence intervals were reported. As the duration and mode of delivery as well as an episiotomy may indicate cases with higher perinatal morbidity, additional multivariable regression models for all outcome variables were additionally fitted adjusting for these confounders.
Differences in perineal laceration rates of higher degree, duration of birth and instrumental delivery were reported descriptively. As a sensitivity analysis, the same analysis strategy (except for the imputation related steps) was applied to the original data, leading to relevant differences in the estimated effect sizes, i.e. the results seem to heavily depend on the analysis strategy. All analyses were performed using R (version 3.5.1; Foundation for Statistical Computing, Vienna, Austria).

Results