Introduction
Bioacoustics – the study of sound production, dispersion and reception
in animals – has been practiced for millennia. Even in underwater
systems, Aristotle had described communication between animals in great
anatomic and behavioural detail (Aristotle, 1910; Linke et al., 2020).
Bioacoustics can be used to study animal ecology – for example,
reproductive behaviour and success (Teixeira et al., 2019) - to monitor
population dynamics of native or invasive species (Brodie et al.,
2020b), and to detect rare and endangered soniferous animals (Dema et
al., 2020; Dutilleux and Curé, 2020; Znidersic et al., 2020). The sister
discipline ecoacoustics is a new field that is not restricted to biotic
organisms, but – like ecology to biology – investigates biodiversity,
its relation to habitats as well as populations and ecological
communities (Sueur and Farina, 2015).
Ecoacoustics has been used to quantify ecological responses to
environmental restoration or improvement in condition, providing a rapid
and continuous monitoring framework that can detect both degradation and
restoration success (Greenhalgh et al., 2021; Linke and Deretic, 2020;
Znidersic and Watson, 2022). Often, acoustic indices are used in
assessments. These indices are analogous to measurements of diversity or
richness in classical ecology – they summarise the acoustic properties
of an overall soundscape, for example its spatial, temporal or combined
complexity, its overall volume or the relation between natural and
human-influenced frequency bands (Sueur et al., 2014). However, given
inherent variations in soundscapes between places, ecoacoustic indices
must be calibrated by ecosystem. While some authors have described clear
variation along landscape gradients (Ng et al., 2018), others have found
little relation of acoustic indices to human disturbance (Mitchell et
al., 2020). Other studies have found that acoustic indices can be
dominated by single acoustic events, for example river flows (Linke and
Deretic, 2020) or single species that dominate the soundscape, such as
snapping shrimp (Bohnenstiehl et al., 2018).
To examine restoration of wetlands in Australia’s most highly regulated
river system, the Murray-Darling, Linke and Deretic (2020) used both
manual annotation and ecoacoustic indices to track recovery of amphibian
and waterbird populations. The Murray-Darling system currently flows at
only ~40% of its natural capacity, with the bulk of the
extracted water used for irrigated agriculture (Grafton et al., 2014).
Under a federal government initiative - the “Murray-Darling Basin
Plan” - water is being returned to rivers and wetlands via water
buybacks from irrigators, however quantifying ecological recovery over
the long term is difficult (King et al., 2015; Souchon et al., 2008).
Linke and Deretic (2020) pioneered the use of ecoacoustic analysis as a
tool to continuously monitor populations after restorative water returns
to wetlands. When manually listening to recordings of frog and bird
calls, they found highly significant responses in richness of
water-dependent biota to environmental watering. However, the response
of acoustic indices was much weaker, and in some cases non-significant,
partially obfuscated by ambient noises, and also subject to high diurnal
variation. This led the authors to conclude that a logical next step was
to trial multi-species call recognisers that would combine the advantage
of species specificity with the automated processing of acoustic indices
(Linke and Deretic, 2020).
Call recognisers usually function to detect single species, since
bioacoustics is often used to detect cryptic or rare animals. However,
as the application of acoustics to environmental monitoring increases,
multi-species recognisers are likely to become more important.
Multi-species recognisers detect sympatric species simultaneously
(Wright et al., 2020; Zhong et al., 2020), and outputs can be analysed
for species separately or combined. This is useful where groups of
species (e.g. mixed species frog choruses) represent environmental
change or other ecological values. Like single-species recognisers,
multi-species recognisers can use acoustic indices to detect soundscapes
in which target species are likely to occur (Brodie et al., 2020a), or
they can implement several single-species algorithms to detect discrete
calls (Ruff et al., 2020). There are many challenges to creating
reliable multi-species recognisers, however methods for reducing the
increased risk of false detections are beginning to be examined (Campos
et al., 2019; Wright et al., 2020).
Performance metrics used to evaluate and report on call recogniser
performance are highly variable in the literature (Knight et al., 2017).
Terminology is inconsistent and studies may report only a small number
of possible performance metrics. This makes comparisons and
repeatability difficult. Perhaps more importantly, there are major
inconsistencies in type and amount of training data used and the test
datasets upon which recognisers are evaluated. While strictly
standardised methods are unlikely be feasible (e.g. for rare species,
datasets can be extremely difficult to acquire), studies should, as a
minimum, report the representativeness of the training data, how these
were chosen or tested, and any limitations or assumptions. Decisions
relating to, for example, geographical coverage may have important
consequences for recogniser performance and transferability among
regions. Moreover, the extent to which training and test data include
real-world ambient noise should be explained, because factors like wind,
noise and other species’ calls can significantly impact false detections
(Brandes, 2008; Cragg et al., 2015; Crump and Houlahan, 2017; Kahl et
al., 2021; Knight et al., 2017; Priyadarshani et al., 2018; Salamon et
al., 2016; Towsey et al., 2012). To standardise the reporting of
performance metrics, Knight et al. (2017) recommended all studies report
precision, recall, F-score and area under the precision-recall curve
(AUC) or, for comparison with the broader classifier literature,
receiver operating characteristics (ROC) curve.
Using a template-matching algorithm (binary point matching, Towsey et
al., 2012) in the R package monitor (Katz et al., 2016b) , we aimed to
establish a free and open source protocol to optimise multi-species call
recogniser construction and evaluation using three levers: template
selection, amplitude cut-off and score-cut-off.
- First, we tested the performance of geographically-representative
candidate call templates (training data) and, from this, selected a
small number of high-performing templates from which to construct call
recognisers.
- Second, we examined call templates at a range of amplitude cut-offs,
which alters their detection sensitivity.
- Third, we tested templates across a wide range of score cut-offs,
which defines the threshold of similarity between templates and sound
data at which a detection is returned.
As a case study, we tested this protocol on the calls of eight sympatric
frog species from the Koondrook-Pericoota wetland complex in the
Murray-Darling Basin.