Relative volumes of three tissues were calculated and stored in RV matrix. The formula to calculate relative volumes is shown as below:
\(Relative\ Volume_{Tissue}=\ \frac{Absolute\ Volume\ _{Tissue}\ }{Absolute\ Volume_{Overall\ brain}}\ \)
For convenience of following model building all relative data was stored in the RV matrix, such as subject IDs, gender codes, age, and three relative volumes.

Volume-based Gender Classification and Cross-validation

Dataset of 652 subjects was splited into two equally sized sets (X1, y1) and (X2, y2), which were used in training and testing in an alternating way, thus each set was used as (Xtrain, ytrain) and (Xtest, ytest) exactly once. Here in this report, a 2-fold RepeatedStratifiedKFold cross-validation method was used to split the dataset. Through this way, two classification models were fitted by training dataset and compared to validate each other. Cross-validation was also used in optimizing parameters when fitting supervector classifier (SVC). Method which computes area under ROC curve and Classification Report method which computes precision, recall, f1-score and support were performed to evaluate the average accuracy of different methods. To sum, four classifiers were trained and tested in the experiment, including Linear SVC, SVC(kernal=Linear/RBF/Polynomial), Stochastic Gradient Descent (SGD) and Decision Tree. 

Image-based classification using Greymatter Maps

Greymatter maps were smoothed with gaussian filter and then downsampled to reduce dimensionality before PCA. In this report, Discrete Gaussian method was used to smooth greymatter maps. Then the smoothed maps were downsampled by a factore of two. The preprocessed maps were stored in a big matrix with features of each sample in one row. Dataset of 652 subjects was splited into two equally sized sets according to cross-validation method same as above described. Then PCA was performed to reduce dimensionality and extract principal feature components. Cross-validation was also used in optimizing parameters when fitting supervector classifier (SVC). Method which computes area under ROC curve and Classification Report method which computes precision, recall, f1-score and support were performed to evaluate the average accuracy of different methods. To sum, four classifiers were trained and tested in the experiment, including Linear SVC, SVC(kernal=Linear/RBF/Polynomial), Stochastic Gradient Descent (SGD) and Decision Tree. 

Results

Best Paramters for SVC

Grid Search Cross Validation method was performed to search for best parameters of SVC models trained by both relative volume features and principal greymatter map features.  Best parameters set of {'C':1, 'gamma': 0.001, 'kernel': 'rbf'} was found when training SVC by CSF relative volume, GM relative volume, and WM relative volume feature, which was also validated by cross-validation. Best parameters set of {'C':10, 'gamma': 0.001, 'kernel': 'rbf'} was found when training SVC by CSF relative volume, GM relative volume, and WM relative volume feature, which was also validated by cross-validation. Details could be checked by source code file attached to the report.