Statistical Analysis
We developed a predictive model to identify women with recent GDM around the time of delivery at high risk for developing impaired glucose tolerance by 1 year postpartum. Model development proceeded in several steps. First, to address missing data among potential predictors, we used multiple imputation with chained equations where continuous and count variables were imputed using predictive mean matching (n=30 imputations).23 Second, within each multiply imputed complete dataset we used logistic regression to evaluate bivariable associations between each candidate predictor and impaired glucose tolerance. Finally, to develop a parsimonious multivariable model we used Lasso regression with k-fold cross validation (k=10 folds) for covariate selection and combined estimates across multiply imputed datasets using Rubin’s rules.23 Lasso regression is a predictive modeling approach that allows for principled covariate selection when there is a large number of collinear covariates.22,24
A priori we specified that any covariate selected in >60% of the imputed datasets would be included in the final multivariable model.29,30 The discriminatory ability of each model was assessed by the area under the receiver operating curve (AUC). To evaluate the robustness of the final multivariable model, we performed sensitivity analyses including only participants with complete case and examining how exclusion or inclusion of predictors of type 2 diabetes which may not be available in routine care (e.g. glucose values at 2-days postpartum) or be collinear (e.g. weight and BMI) influenced model results (Supplemental Table S2).
Finally, we evaluated the calibration and predictive ability of the final multivariable model to identify women at high risk for impaired glucose tolerance at 1 year postpartum. To assess model calibration, model predicted probabilities for impaired glucose tolerance were categorized into quartiles, cross-tabulated, and graphed against true event outcomes.31 To evaluate model predictive ability, we assessed the sensitivity, specificity, and positive and negative predictive values (PPV, NPV) across a range of predicted probability cut-points to identify women most likely to progress to impaired glucose tolerance by 1 year postpartum. Statistical analyses were performed using Stata 16.1 (StataCorp. 2019. Stata Statistical Software: Release 16 . College Station, TX) and R 4.0.2 (R Core Team. 2020. R: A language and environment for statistical computing . Vienna, Austria: R Foundation for Statistical Computing).