Label prediction performance
Balanced accuracy scores for the 1-min UMAP dimensions were high
(> 0.85) for the location label (Table 1). Of the samples
labelled as ‘Burin’ and ‘Red Island’, 94% and 95% were correctly
identified using the UMAP dimensions, respectively. Scores for seismic
airgun presence were also high; however, model sensitivity was poor
(58.3%), meaning that true positive and false negative predictions
occurred with almost equal frequency. Repeating model training using the
128 acoustic features improved performance, and resulted in a drop of
both false negatives and false positives. The ship presence classifier
trained on the two UMAP dimensions showed a balanced accuracy score of
0.7, with only 33% of samples being correctly identified as presences.
The acoustic features classifier displayed a higher balanced accuracy
score (0.86), and the number of correctly predicted presences, although
still low, increased to 58%.
The random forest classifiers for humpback whale presence trained on the
two UMAP dimensions showed the lowest F1 and balanced accuracy score
(0.59 and 0.62, respectively), resulting in a large number of
mislabelled samples. Once again, repeating model fitting using the
acoustic features improved model performance. Training the classifier on
the 128 dimensions resulted in increased balanced accuracy score, mainly
due to a dramatic increase in classifier sensitivity (93.9%) when
compared to the performance of the classifier trained on UMAP dimensions
(<0.001%).
Confusion matrices for the WMD and PBD cross validation runs are
reported in Appendix S1.