Deep learning training and evaluation on the left out test set: Part A shows the training and validation loss scores for 10 training runs, each with a different initialization seed. The training loss tends towards 0 but the validation loss plateaus between 0.05 and 0.07 mean squared error at the 10th epoch. Part B shows the ROC curve of the prediction on the test set against the binary classified gold standard slices, along with the ROC curves computed from previous analysis (the average crowd rating, and the XGBoosted ratings).