Introduction
Magnetic resonance imaging (MRI) plays an important role in brain study and related clinical research. MRI-based measurement of brain size, segmented region volume, and other structure details provides evidence for brain disease detection, aging research, as well as evaluation of drug therapy. Thus automatic segmentation which supports for massive structural analysis of brain MR images was proposed to substitute pure manual segmentation. Rapid development of automatic segmentation was largely dependent on machine learning \cite{Gryska2019}. It is known that for different genders they own different volumes of brain structures\cite{Yücel2001}. However, overall brain volume varies across individuals, which makes it probably not predictive if taking the absolute volumes of tissue. Instead, relative volumes being computed as the ratios between each tissue and the whole brain volume may be proper features for prediction. Gaussian Mixture Model is a commonly used way to implement segmentation of brain tissues. Aside from taking relative volume as feature to predict gender, pixels of spatially normalised greymatter maps by regeistering to common references could aslo be seen as features. However, to be practical, they need dimentional reduction methods to extract principal features. One common way used in reducing dimentions is principal component analysis (PCA).
In model fitting process, features are taken as training data and corresponding predicting targets are taken as training targets. Different classifiers were tested and verified by cross-validation and their average accuracy were judeged by f1 score, precision, recall, area under receiver operating characteristics (ROC) curve (AUC) and support. Recall represents the ability of a model to find all the relevant cases within a dataset. Precision represents the ability of a model to identify only relevant data points. F1 score is to find an optimal blend of precision and recall. The larger F1 score is, the better the model performs. AUC-ROC curve is a performance measurement for classification problem at various threshold settings. ROC is a probability curve and AUC represents degree or measure of separability. An excellent mdoel has AUC near to the 1 which means it has good separability. A poor mdoel has AUC near to the 0 which means it has worst measure of separability. And when AUC is approximately 0.5, model has no discrimination capacity to distinguish between positive class and negative class.
Material and Methods
Data from 652 subjects were provided, including MRI images, brain masks, and grey matter maps which have already been extracted from a set of MRI scans and aligned to a common reference space to obtain spatially normalised maps. Meta data containing information about the subjects' IDs, their age and gender was provided as well.
Population Stastics Plot
Meta data was firstly loaded to provide an overview and visualization of the statistics of the population of 652 subjects. Gender distribution (as shown in figure 1), age normal distribution (as shown in figure 2) and age scattered distribution (as shown in figure 3) were shown in plots.