Normative Model Generation

Next, we sought to create a normative model to represent the variability seen across the cortical sheet in healthy volunteers in our feature space (see overview in Figure 2).  This normative model then served as the basis for our outlier detection approach to identify FCDs.  The characteristic MR findings of FCDs tend to be centered around the gray matter - white matter junction (GWJ), with characteristic alterations in signal intensity in the cortical gray matter and underlying white matter as well as blurring of the gray-white junction.  Since our features are designed to represent the local neighborhood surrounding each voxel, we hypothesized that these characteristic FCD-related abnormalities could be identified by focusing on the voxels at and surrounding the GWJ.  We therefore sampled our 39 feature vector onto the the FreeSurfer generated smooth white matter surface and standardized each feature within each subject.  This was followed by dimensionality reduction and whitening using principal component analysis (PCA) of all cortical vertices across all HVs implemented in scikit-learn in Python \cite{scikit-learn}, retaining 14 components explaining 90% of the variance.  
Next, in order to facilitate more straightforward multivariate outlier detection, we implemented a rotation-based iterative Gaussianization (RBIG) procedure to transform the data into a latent representation with a known probability density function (PDF), in this case a multivariate Gaussian distribution (Figure 2) \cite{Laparra2011}.  This is a form of representation learning similar to an auto-encoder, in which the data undergoes a nonlinear transformation to facilitate more straightforward statistical inferences \cite{Bengio2013}.   Amongst the many available methods, we selected this representation specifically to allow for straightforward estimations of outlierness, as well as estimates of similarity of vertices across the cortical sheet.  The RBIG procedure consists of 10 iterations of a pair of sequential transformations: 1) a non-linear univariate Gaussianization transformation applied to each of the data matrix columns (marginals)  that converts percentile scores computed using the rank transformation to standard scores (scikit-learn's QuantileTransformer); and 2) a linear orthogonal transformation applied to the entire data matrix using PCA, retaining all components after each iteration \cite{Laparra_2011}.  Source code for generating the features and model is publicly available on GitHub as part of the JEM python package (http://github.com/InatiLab/jem).