4. DISCUSSION
The machine learning pipeline we used in this study appears to be an effective tool for collecting occurrence data across a range of habitat types at our target site. The lack of statistically significant differences in relative detection frequencies between the audio and camera trap data conflicted slightly with our expectation that acoustic sampling would yield more accurate occurrence metrics than camera trap sampling. However, the sample size of our audio dataset was much smaller than that collected by the camera trap network over a roughly similar period of time (n = 122 before filtering for season), which suggests that acoustic monitoring is capable of yielding much higher data densities per unit surveying time, at least for vocal species. Similarly, increasing the sample size of the camera trap dataset and collecting audio samples from the wet season may yet allow us to identify true underlying differences in detection probabilities for tinamous when surveyed acoustically versus visually.
The significant differences in detection frequency we observed between our data and the eBird data is likely a result of non-random spatial sampling. An example of this spatial non-randomness with a clear causative explanation is the relatively higher eBird detection frequency for C. undulatus , a species that is present widely in floodplain and transitional forest but is also extremely common in edge habitat near the station dwellings where ecotourists and birders visiting the station spend time when not hiking on trails (eBird, 2017; personal obs). We chose not to include C. strigulosus in frequency analyses as it is represented in our audio dataset mainly by detections at sites east of the Río Los Amigos that birders and ecotourists visiting the station are rarely if ever able to access, therefore heavily limiting its sampling density in the eBird dataset (personal obs). However, even in the absence of quantitative assessment, we nonetheless believe this is another clear case of spatially non-random eBird sampling patterns relative to the more structured audio and camera trap data. We therefore advise caution when using eBird data to generate site-level relative occurrence frequencies for tropical forest birds, as doing so properly requires a substantially better-informed set of sample bias corrections than we chose to use for this illustratively naïve approach. eBird’s own Status and Trends methods are a classic example of how this can be done analytically, though the relatively low eBird data density across the Neotropics has meant that analyses using these methods have mainly been focused on the temperate zone (Sullivan et al., 2009; Sullivan et al., 2014; Fink et al., 2018). Employing study designs that use eBird data as an adjunct to more structured surveying techniques is another possible strategy (Reich et al., 2018), as this strategy reduces the proportion of overall bias due to eBird on ecological modeling efforts in this region while retaining the benefits of using multiple independent datasets to address the same question.
A common question posed by research scientists in the pursuit of an efficient but effective machine learning platform is “how much training data is enough data.” Our two-pass classification strategy demonstrated clear classification accuracy improvements over a single pass, though the degree to which our ensemble modeling strategy improved classification performance varied substantially between classes. We suspect that most of the performance improvements that could be gained beyond what we saw in our analysis would come from gathering additional survey data, iterating the data collection and training processes to increase sample sizes, and further improving the model architecture and hyperparameters. It is important to note that the main limiting factor for our use of machine learning classification has been the amount of computational power available to us, which required us to decrease the complexity of our neural networks and the resolution of our spectrograms relative to those mentioned in the literature (Knight et al., 2017; Kahl et al., 2019). While doing so allowed us to produce classification results within acceptable time constraints, this speed benefit potentially came at the cost of reduced classification accuracy. An important future goal for our analyses is to securing sufficient computational power to run the classification at full resolution to quantify improvements in accuracy, as we strongly believe that understanding the minimum acceptable resolution necessary to achieve a given level of accuracy is a crucial logistical consideration for researchers seeking to build hardware systems to support similar data processing pipelines.
Acoustic monitoring represents a promising method for studying bird biology and life history. We are particularly excited by the prospect of being able to use this SWIFT survey data in future analyses to identify the life history and microhabitat characteristics that result in niche partitioning in the tinamou community of lowland Madre de Dios. We anticipate that additional data collection, particularly during the wet season, and further refinement of this machine learning pipeline will allow us to build occupancy models for these species using elevation maps and vegetation structure datasets that were collected for use with the associate camera trap grid as environmental covariates (Royle & Nichols, 2003).