Statistical Analysis
We developed a predictive model to identify women with recent GDM around
the time of delivery at high risk for developing impaired glucose
tolerance by 1 year postpartum. Model development proceeded in several
steps. First, to address missing data among potential predictors, we
used multiple imputation with chained equations where continuous and
count variables were imputed using predictive mean matching (n=30
imputations).23 Second, within each multiply imputed
complete dataset we used logistic regression to evaluate bivariable
associations between each candidate predictor and impaired glucose
tolerance. Finally, to develop a parsimonious multivariable model we
used Lasso regression with k-fold cross validation (k=10 folds) for
covariate selection and combined estimates across multiply imputed
datasets using Rubin’s rules.23 Lasso regression is a
predictive modeling approach that allows for principled covariate
selection when there is a large number of collinear
covariates.22,24
A priori we specified that any covariate selected in
>60% of the imputed datasets would be included in the
final multivariable model.29,30 The discriminatory
ability of each model was assessed by the area under the receiver
operating curve (AUC). To evaluate the robustness of the final
multivariable model, we performed sensitivity analyses including only
participants with complete case and examining how exclusion or inclusion
of predictors of type 2 diabetes which may not be available in routine
care (e.g. glucose values at 2-days postpartum) or be collinear (e.g.
weight and BMI) influenced model results (Supplemental Table S2).
Finally, we evaluated the calibration and predictive ability of the
final multivariable model to identify women at high risk for impaired
glucose tolerance at 1 year postpartum. To assess model calibration,
model predicted probabilities for impaired glucose tolerance were
categorized into quartiles, cross-tabulated, and graphed against true
event outcomes.31 To evaluate model predictive
ability, we assessed the sensitivity, specificity, and positive and
negative predictive values (PPV, NPV) across a range of predicted
probability cut-points to identify women most likely to progress to
impaired glucose tolerance by 1 year postpartum. Statistical analyses
were performed using Stata 16.1 (StataCorp. 2019. Stata
Statistical Software: Release 16 . College Station, TX) and R 4.0.2 (R
Core Team. 2020. R: A language and environment for statistical
computing . Vienna, Austria: R Foundation for Statistical Computing).