Statistical Analysis
The Derivation Data Set was randomly divided in equal halves. One half
(50%) was used for variable selection and estimation of parameters of
the prediction model (train) and the other half (50%) was used for
internal validation (test). The Omicron Data Set was used for external
validation. Descriptive statistics included frequency tables for
categorical variables. Patient characteristics were compared between the
subsamples (train vs. test and train vs. Omicron) using the Chi-square
test.
Given the large sample size (n train = 120,536 and n test = 120,535), we
developed the multivariate logistic regression models (1.- Hospital
admission; 2.- Death; and 3.- Adverse evolution) using Lasso logistic
regression which employs penalized likelihood for parameter estimates
and variable selection in the train subsample. In the final models, only
factors with p<0.01 were retained. Odds ratios (ORs) and 95%
confidence intervals (CIs) were estimated. The discrimination ability of
the model was measured by the area under the ROC curve (AUC).
To develop the predictive risk scores for each of the outcomes, we first
assigned a weight to each risk predictor variable in relation to the
estimated β parameters based on the lasso logistic regression model
derived in the train subsample. We then added up the risk weights of all
the patient’s predictor variables, with higher scores indicating a
greater likelihood of event. The predictive accuracy of the risk score
was assessed using the AUC in train, test and Omicron samples. Based on
the risk score, we categorized the score into four different levels of
risk. The optimal thresholds in the continuous risk scores were
determined with the catpredi function of the R package CatPredi, using
the addfor algorithm which maximizes the AUC for the categorized score.
The performance of the risk classification was evaluated by means of the
AUC and by studying the probability of event occurrence in each of the
risk categories. In addition, the true positive rate (TPR), true
negative rate (TNR) and the net benefit (NB), which considers the
relative benefits and harms, were computed for each of the risk cut-off
points. The model, score and categorized score were all validated in the
Omicron sample by means of the AUC. All effects were considered
significant at p<0.01. All statistical analyses were performed
using R© version 4.1.2.