This figure shows the deep learning training and evaluation on the left out test set. The deep learning model was trained to predict the XGBoosted labels from the crowd. Part A shows the training and validation loss scores for 10 training runs, each with a different initialization seed. The training loss tends towards 0 but the validation loss plateaus between 0.05 and 0.07 mean squared error at the 10th epoch. Part B shows the ROC curve of the prediction on the test set against the binary classified gold standard slices, along with the ROC curves computed from previous analysis (the average crowd rating, and the XGBoosted ratings).