Ordinal Logistic Regression
To establish the probability that any given AKAP11 variant has a
statistically significant role in the diagnosis of BPD or SCZ, we run a
logistic regression on every variant entered in the SCHEMA \citep{j2020} and BipEx \citep{d2021} browsers for SCZ and
bipolar I, respectively. Based on the consequence terms of each variant
defined by Sequence Ontology \citep{Cunningham2015}, every entry is
classified into 4 classifications, \(j\), ranging from statistically
unlikely to cause diagnosis to statistically likely to be pathogenic.
Any variant without sufficient genome annotation was removed from the
analysis (i.e., lack of CADD scores or protein sequence identification),
resulting in a variant population of n = 745 and n = 380 for SCZ and
BPD, respectively.
With categorized variants, we perform ordinal logistic regression
\citep{e2018} where the probability of a variant being placed
into a certain class is assessed via a given set of independent
variables. This was performed with the XLSTAT Version 2023.1.2 software
package \citep{lumivero2023}. Five pertinent values were chosen as
independent variables to collectively conclude how likely each variant
is to be placed in each class: (i) the CADD functional annotation
\citep{Rentzsch2019}, allele number for the (ii) control and (iii)
case group, and allele frequency for the (iv) control and (v) case
group. 50 iterations were performed to increase the precision of the
regression coefficients. Using XLSTAT automatic weightings for each
independent variable, we find that there is little significant influence
that (ii) through (v) has on the classification of each variant, and
thus rely primarily on the CADD functional annotation as a means of
determining the probability that any given variant is placed in some
class.
The ordinal logistic regression model is a form of logistic regression
wherein the probability of a categorical event occuring is assessed via
a given set of independent variables. Within the context of this study,
we take the above listed independent variables (with the primary
weighting given to the CADD annotation) to find the probability that a
given variant is classified under some \(j\).