Demographical, physical activity, and gynaecological variables
Participants’ demographical, gynaecological, and physical activity
variables have been described in detail previously.11Shortly, age was calculated from the date of birth to the date of
answering to the prequestionnaire. BMI was calculated as body mass (kg)
divided by height squared (m2). Level of education was
self-reported with a structured question and participants were
classified into two groups based on their answers: those with bachelor
level or higher education and those with education lower than bachelor
level. Work-related physical activity was assessed with a structured
question and participants were classified into the following groups:
mainly sedentary work, work that includes standing and walking, and
heavy work that includes also lifting.
Physical activity at the age of 17 to 29 years was assessed with the
question: “What kind of regular physical activity have you done at
different stages of your life?”13 Participants were
asked to specify their participation by selecting one or more of the
following four options: no physical activity, regular independent
leisure-time physical activity, regular competitive sport and related
training, and regular other supervised physical activity in a sports
club, etc. Current physical activity was evaluated with a self-reported
questionnaire14 including four questions about the
frequency, intensity and duration of leisure-time physical activity
bouts as well as the average time spent in active commuting. Based on
the answers, a metabolic equivalent of hours per day (MET-h/d) for
current physical activity was calculated.
Participants were assigned to premenopausal, early and late
perimenopausal, and postmenopausal groups based on the FSH
concentrations and self-reported menstrual bleeding diaries using the
slightly modified Stages of Reproductive Aging Workshop (STRAW+10)
guidelines.15 Self-reported data on gestations,
parity, and whether a participant had undergone hysterectomy were
collected.
Missing data
The total number of missing data values for the analytical sample
including 1 098 participants was 338 out of 29 646 (1.1%). The
percentage of missing values varied from 0 to 10% between the variables
(Table S1). The data was missing due to the invalid or missing
measurements and unclear or incomplete questionnaire response. Thus,
missing data were assumed to occur at random. Multiple imputation was
used to create and analyze 50 multiply imputed data sets with 50
iterations for chained equations for each16. The model
parameters were estimated separately for each data set. Multiple
imputation and pooling of the model estimates were carried out in
R17 using the standard settings of the “mice”
package.16 For comparison, we also performed complete
case analysis and there was no significant differences in the results.