Training Deep Learning to Automate Image Labeling
The deep learning model was trained to predict the XGBoosted labels that were based on crowd ratings. We loaded the pretrained VGG16 network \cite{simonyan2014very}, removed the top layer, and then trained a final layer dense layer followed by a single node output layer. The training of the final layer was run for 50 epochs and the best model on the validation set was saved. We ran this model 10 separate times, each time with a different random initialization seed, in order to measure the variability of our ROC AUC on the test set. We found that the training and validation loss scores were equal at around 10 epochs, after which the model began to overfit. We ran inference on the held out test set with the 10 models with the lowest validation loss from each run, and calculated the AUC. We found that the deep learning model had the highest AUC of 0.99 with a standard deviation of 0.12. Therefore we found that a deep learning network trained on crowd-generated labels matched closer to expert ratings than crowd-generated labels alone.