4.
Discussion
Most major classes of machine learning methods have been used across all
fields of species threat and conservation analyses. Furthermore, we
found usage of these methods to be consistent with previous reviews of
the use of machine learning in conservation (Thessen, 2016; Liu et
al. , 2021). From species distribution models, which are the basis for
further analyses, to advising on the best options for preserving genetic
diversity, machine learning is increasingly used, as methods become
simpler to use and implement for multiple purposes. Yet, many methods
have been more commonly used for specific objectives, as they are either
widely studied regarding specific problems, or because they have been
found to be particularly effective. Here, we attempt to detail the use,
advantages and disadvantages of each method while providing some
examples that attempt to cover a large spectrum of uses. The order of
methods follows their popularity in publications as measured by
frequency.
4.1 MaxEnt
MaxEnt models have been extensively used for feature (e.g. species)
distribution in space, as they can cope with a large number of
covariates and with small sample sizes; in the latter case through their
regularised versions (Mohri et al. , 2018). As MaxEnt has been
extensively researched with background data, its popularity is mostly
due to its usage in species distribution models (SDM); estimating the
species distributions being the first step in many conservation
exercises. Before MaxEnt’s popularity, other ML methods, such as the
genetic algorithm for rule-set production (GARP), were used for SDMs,
but their use has decreased since then (see section 4.7). MaxEnt’s
easy-to-use GUI (graphical user interface) and good performance when
compared to similarly purposed models (Elith et al. , 2006)
contributed to its popularity. Therefore, it is natural that MaxEnt was
found to be the most commonly used ML method in this review.
Using MaxEnt, with predicted habitat suitability (often interpreted as
potential distribution) of a species of conservation concern as its
response variable, is by far the most common use of the method. Studies
employing MaxEnt this way often use a selection of both environmental
variables and stressors. Integrative studies potentially include
distribution data on invasive species and other stressors, such as
powerlines, roads and agricultural development (Bradley, 2010).
Modelling the distribution of invasive species and pathogens is also
common practice. One such example, Azzurro et al. (2013), created
a species distribution model for the invasive bluespotted cornetfish
(Fistularia commersonii ) using human population density (among
other abiotic variables) as predictor. The response variable can also be
something unrelated to species distributions. Focusing on the indirect
impacts of agriculture, Mateo-Tomas et al. (2012) used MaxEnt to
model the number of poisoning events driven by livestock presence due to
predation risk. The baseline is that almost all response variables that
could reasonably be expected to follow the maximum entropy principle can
be modelled using MaxEnt.
4.2 Bayesian methods
Bayesian methods are capable of handling problems with small datasets,
provided there is prior information on the problem. They can also
provide information on credible intervals of unobserved parameters
(Murphy, 2012). Bayesian methods were found to be the second most
popular ML option, likely due to their capability of incorporating prior
models. Of all Bayesian methods, Bayesian belief networks (BBN),
Bayesian regression models (GLM, GAM etc.), Bayesian clustering
analyses, and MCMC are the most common methods found in the literature.
They are most common in management and experimental settings, in
archetypal Before-After-Control-Impact (BACI) studies where a model is
usually created and then parameterised through MCMC.
Bayesian methods can be used to directly model species occurrences based
on stressors. Shokri et al. (2021) performed a comprehensive
analysis of the spatial patterns of vulnerability and conservation of
the Caspian red deer (Cervus elaphus maral ), including attempts
to model (Bayesian GLM, MCMC) red deer occurrences with the number of
local ranger stations as an explanatory variable, hypothesising this to
act as a proxy for dissuasive measures on illegal hunting. Similarly,
Christie et al. (2015) used Bayesian mixed models and MCMC to
study the effects of oil development (oil wells and road density) in the
density of pronghorns (Antilocapra americana ) in the United
States. However, the explanatory variable does not necessarily need to
be an occurrence metric. Wang et al. (2015) used Bayesian vector
autoregression to model the relationship between economic development
and an environmental quality index. Additionally, Bayesian methodologies
are used in geographic information system (GIS) applications, with
empirical Bayesian kriging being used to interpolate overall groundwater
quality, based on sample water quality (Hossain & Patra, 2020).
Modelling the relationship between phenotypic traits and ecosystem
dynamics and stressors is most commonly done with Bayesian models – an
aspect that should increase in popularity with the growing availability
of trait data. Primack et al. (2009) developed a Bayesian
hierarchical model parameterised with Gibbs sampling to study the
interactions of temperature, site effects and latitude on phenological
attributes, such as the first flowering date of an apricot tree species
and the arrival of a migratory swallow. Likewise, in a controlled
environment, Tucker et al. (2012) used hierarchical Bayesian
modelling to compare the short-term and seasonal responses in soil
respiration under changing thermal conditions and a variable substrate
(with or without a simple sugar, dextrose). These changes to soil
respiration could in turn, at a sufficient scale, deeply affect the
global carbon cycle and climate at large.
Bayesian belief networks are the natural choice for many conservation
studies, including expert opinion, because they take advantage of
Bayesian methods’ small-dataset capabilities and condense expert opinion
and bibliographic knowledge into a simple statistical model. They do so
by transparently relating, through conditional probability tables, input
variables which are non-measurable or otherwise abstract independent
variables (e.g. political support for conservation) with proxy variables
(e.g. corruption index, lawmaker survey data). It is then up to the
authors to make an educated guess as to the priors of these variables
and the relationships between the measurable proxies and the input
variables (Landuyt et al. , 2013). In an archetypical example,
Namkhan et al. (2020) sought to model the vulnerability of
lowland forest habitats in mainland Asia and incorporated several input
variables pertaining to biological resource use. These include local
hunting (whose proxy variables are border distance and road presence,
among others) and protection level (whose proxy variables are protected
area type and presence of charismatic species, among others).
Smith et al. (2007) modelled habitat suitability for the Near
Threatened Julia Creek dunnart (Sminthopsis douglasi) with a
variety of GIS-derived proxy variables, such as distance to water and
density of an invasive species of tree, Acacia nilotica (L.).
These then informed a set of key input variables, such as grazing
pressure and density of soil cracks. In another study, given the lack of
information on landscape-scale risks apart from those from the invasiveMimosa pigra (L.), for which experimental data already existed,
Bayliss et al. (2012) opted to perform an ERA (ecological risk
assessment) for a floodplain downstream of an Australian uranium mine as
a BBN. The overall risk is composed of a mine site ERA and a landscape
ERA, both with their sets of input variables (e.g. uranium’s ecological
risk and unmanaged fire risk, respectively) and proxy variables (e.g.
uranium concentration and fire exposure, respectively).
It should be noted that some authors rely on BBNs less for their
practical integration of input and proxy variables, and more for their
simplicity and transparency. As an example, Van der Biest et al.(2014) noted the lack of models for ecosystem services (ES) ‘in terms of
their supporting systems, namely the biophysical potential for the
delivery of services’. As a response, they advanced the EBI (Ecosystem
Service Bundle Index), an index coupled with a BBN model that estimates
provision of ecosystem services with ES input variables (e.g. food
production, wood production) informed by proxy variables (e.g. land use
class, soil texture class) sourced with spatially explicit data. Lastly,
Bondé et al. (2020) used a BBN on an arguably entirely
theoretical level. They broadly identified three input variables
(climate change, overexploitation, and land use change) influenced in
turn by six input variables based on policy changes (e.g. protected area
expansion, promotion of agroforestry). The model was then used in
conjunction with several policy scenarios (e.g. ‘business-as-usual’,
‘agroforestry and fair trade scenario’) to predict the trend of shea
tree, Vitellaria paradoxa (C. F. Gaertn) abundance.
Finally, in the specific context of conservation analysis, Bayesian
methods are also very frequently used to analyse genetic data. Such
approaches can be used, for example, to determine the origin of specific
stressors or to study genetic diversity loss. Manel et al. (2002)
exemplify this possibility by accurately determining the origin of
trafficked individuals by running a Bayesian clustering analysis with
MCMC on sequence data of particularly polymorphic DNA markers, shedding
light on potential trafficking pathways in a way that is reminiscent of
product authenticity problems (Montowska et al. , 2010).
Similarly, in Oliveira et al. (2008), the same methodology was
used to determine the hybridisation between wild and domestic cats in
Portugal. The same can also be done in management scenarios. As an
example, Barilani et al. (2007) assessed potential progressive
introgression in a partridge species caused by reared individuals
supplementing a population in decline due to overharvesting. Likewise,
Hone et al. (2010) examined the maximum proportion of a
population that can be removed to stop population growth in Australian
and New Zealand mammal species, and its implications on management (and
harvest).
4.3 Ensembles
Ensembles are powerful and versatile models with several advantages.
They are often capable of handling non-linear responses and large
datasets with many response variables. They also have a reduced risk of
overfitting if adequate measures are taken (Cutler et al. , 2007).
Finally, they have the advantage of increasing the accuracy of the model
compared to that of each of its component models. The fact that the most
popular form is random forests stems from the simplicity of the
component models, decision trees. However, their interpretability is
mostly limited to the heuristics that can be applied to black-box models
(Murphy, 2012). Also, for SDM, they require species presence–absence
data, which might be limiting.
Boosted regression trees (BRT) and random forests (RF) are by far the
most popular types of ensembles in conservation biology. They are good
‘all-purpose’ choices in classification and regression problems.
Johnstone et al. (2010) used BRTs to determine how variations in
pre-fire vegetation, fire effects and spatial and environmental
variables affected post-fire regeneration in boreal forests by
separately modelling three different response variables: seedling
densities of the black spruce, Picea mariana (Mill.) B.S.P,
seedling density (including resprouts) of two deciduous tree species,
the trembling aspen, Populus tremuloides (Michx.) and Alaskan
paper birch, Betula neoalaskana (Sarg.), and the proportion of
post-fire seedlings that were spruce. Young et al. (2017) used
BRTs to model fire occurrence in boreal regions with 60 years of
landscape variables (habitat distribution, topography etc.) and climatic
variables (potential evapotranspiration etc.). Coutts et al.(2011) use a BRT to examine the factors conditioning the spread of
invasive species while considering different management scenarios. As
for RFs, Barros & Elkin (2021) attempted to solve a lasting challenge
of managing old-growth forest (i.e. at an advanced development stage) by
predicting and developing an old-growth index. Old-growth forests are
functionally and ecologically distinct, and establishing such an index
based on structural attributes (i.e. basal area of large trees, vertical
variability) allows for old-growth mapping at fine ecological and
spatial scales. Peciña et al. (2021) also used RFs to predict
above-ground biomass in coastal meadows at a fine spatial resolution,
and then statistically assessed the effects of management on sward
structure using historical management data.
Ensembles were also used to estimate the distribution of species (i.e.
SDM) or other features. Sabatini et al. (2017) modelled potential
undetected primary forests in Europe by training a BRT model with a
presence–absence map of primary forests and several explanatory
variables of biogeographic and socio-economic nature, among which was
travel time to the nearest city. Similarly, Catford et al. (2011)
used a BRT to model the distribution of exotic species dependent on
multiple environmental and anthropogenic variables.
Lastly, RFs’ capacity to handle many-featured datasets has been widely
used in studies dealing with pollution. These make frequent use of
arrays with multiple contaminant concentrations as explanatory
variables. Oliver et al. (2017) compare models of long-term
change in lakes through lake-level or region-level drivers based on
total nitrogen (TN), total phosphorus (TP), stoichiometry (TN:TP) and
chlorophyll. Li et al. (2020) linked pollution in lakes to
hydrological features in affluents. H. Zhang et al. (2021) linked
individual heavy metal contaminants to their potential sources by
modelling their concentration with different soil features. Zhang &
Vincent (2019) predict the conservation status of endangered species by
linking it with pollutants and other stressors. Molnár et al.(2020) modelled the air pollution tolerance index (APTI), a measure of a
plant’s ability to counter the effects of pollution, as a function of
particulate matter concentration, city population and land use classes.
In all cases, by using methods of estimating variable importance,
authors explore the contribution of each explanatory variable towards an
end prediction.
4.4 Artificial neural
networks
Artificial neural networks (ANN) are a very powerful family of models
applied to a variety of problems. Deep ANN models have shown a
capability to obtain very good results in complex problems, at the cost
of needing a large amount of samples due to the large amount of
parameters to learn. However, they are black-box models, and therefore
the interpretability of results is limited to the use of a few
heuristics (Goodfellow et al. , 2016). This has perhaps limited
their application to conservation science.
ANNs’ black-box characteristic is not an impairment when dealing with
threats and concepts whose theoretical understanding is secondary to
their practical prediction (e.g. pollution, weather and geological
events). Coste et al. (2009) used presence–absence data of
macrophyte species to predict water quality classes defined by
hydrological features. Additionally, although a variety of models have
been employed to predict flooding events, ANNs are by far the most
popular (see Mosavi et al. , 2018, for a review on this topic).
Likewise, a deep ANN achieves the best accuracy in landslide modelling
when compared to other methods (i.e. SVM, MLP, RF) (Bui et al. ,
2020).
Neural networks have been extensively used for image recognition and
automated identification of species from pictures or videos. These data
types are extremely complex in their nature, and ANNs have the
capability to detect common patterns in images after extensive training,
even if the reasoning behind each classification is often difficult to
determine. One initiative to prevent roadkill, ROOD (kangaROO roaD),
uses the object recognition software YOLOv3 (You Only Look Once) (Redmonet al. , 2018), which employs a convolutional neural network to
recognise carcasses captured on a car-mounted camera device (Yi & Khot,
2020). Spyromitros-Xioufis et al. (2018) review efforts to
estimate air quality based on images of the sky obtained from ground
level. Khalighifar et al. (2021), taking advantage of the image
processing abilities of a CNN, applied it to sonograms of frog calls to
identify the corresponding species. With the advent of big data in
ecology and conservation, current efforts in remote sensing (i.e.
detecting and monitoring physical characteristics by measuring reflected
and emitted radiation at a distance) of species and automatic taxonomic
classification are also based on increasingly powerful and complex
neural networks (Weinstein, 2017; Fairbrass et al. , 2018; Valanet al. , 2019).
In general, complex time-series data is often plagued with noise derived
from natural population or community fluctuations that confound the
trends one is trying to model. ANNs are flexible enough to allow
identifying patterns of interest beyond such ‘noise’ and predicting
future trends, in particular when using recurrent connections (RNN).
RNNs can be used to process sequential and time-series data, an ability
used to determine the drivers of past trends and to forecast future
trends. See Christin et al. (2019) for a review on this topic and
Capinha et al. (2021) for a set of case studies.
4.5 Decision trees
Decision trees are one of the most simple and explainable models of
machine learning. The user can easily extract rules that the model
obtains from training data. They are, however, not robust to updates in
the dataset, since the structure of the decision tree may change with
the addition (or removal) of a few samples (Mohri et al. , 2018),
and thus have never gained popularity as a consequence. Regardless, DTs
are majorly used in decision and management studies as an extremely
simple and intuitive model that can still be potentially more
advantageous in implementation and outreach than more complex models.
Pyšek et al. (2012) used a classification tree constructed with
the CART algorithm (Breiman et al. , 1984) to make a variety of
conservation predictions. From existing trait data on invasive plant
species (e.g. height, pollination, toxicity), the DTs generated
predictions of significant impacts on a variety of elements: species and
communities of resident animals and plants, soil characteristics and
fire regimes. Similarly, Křivánek, et al. (2006) tested invasive
species risk assessment schemes based on DTs using binary trait nodes
(e.g. ‘Is it an agricultural weed elsewhere?’, ‘Is it unpalatable to
grazers?’) to produce a decision to accept, suggest or reject a given
species as non-invasive. Canessa et al. (2020) created a DT using
expert judgement to assess management alternatives in reintroduced
populations of the critically endangered regent honeyeater,Anthochaera phrygia .
Finally, DTs have, as many other ML methods, been used to model species
distributions. As an example, Debeljak et al. (2015) used DTs
with environmental data inputs (e.g. slope, landscape category) to model
the habitat of the black poplar [Populus nigra (L.)], a
riparian species that is threatened by habitat loss.
4.6 Support vector machines
Support vector machines can be used in classification and regression
problems, with good results in moderate-sized problems. They do not
scale well in multi-class problems with a large number of features and
examples. Also, they cannot provide probabilistic evaluation of the
classification produced since they are geometrical models (Murphy,
2012). These characteristics, along with the requirement of
presence–absence data and competition with other techniques occupying
its niche (such as ensemble methods), justify its scarce use as a sole
method for conservation problems.
The notable exception to the above is in problems that make use of
spatial data. Liu et al. (2018) use SVMs to model urban expansion
as a function of 11 spectral variables (e.g. built-up index, soil
brightness, wetness etc.), which they then couple with a least squares
regression analysis between the generated map and geospatial data on
socio-economic factors. In a less conventional example that stretches
the limitations of the model, Gallardo, Errea & Aldridge (2013)
assessed the habitat suitability of a problematic species based on
environmental factors (e.g. precipitation, maximum temperature of
warmest month etc.). In this scenario, pseudo-absence points were
generated to run the model with presence-only data, opposed to the
normal presence–absence data. Furthermore, in order to output habitat
suitability, regression was accomplished by implementing Platt’s
posteriori probabilities. This is a method similar to fitting a sigmoid
to the normal SVM decision values, here used to produce class
probabilities in a markedly non-probabilistic ML method. Conservation
analysis might benefit from these workarounds, applicable to most
classification methods when fully explored.
SVMs have been used in multiple cases of environmental monitoring. Wind
turbines can have a sizable effect on certain bat and bird species
(Stevens et al. , 2013; Davy et al. , 2021), and a
preventative stoppage of the blades could significantly decrease the
number of individuals dying in mid-air collisions with them. Hu &
Albertani (2019) created an automatic collision detection system for
wind turbine blades, which uses automated detection algorithms based on
SVMs. In another study, Neill et al. (2018) used SVMs to predict
the environmental status of an agroecosystem from environmental factors
(seasonality) as well as biological and chemical indicators related to
beehives (population mean, count of pesticides).
4.7 Evolutionary algorithms
Evolutionary algorithms are mostly useful for optimisation problems
where they are recognised for their robustness, providing high-quality
results in many different types of problems (Floreano & Mattiussi,
2008). This means they can be used to optimise the parameters of a
defined model fitting it to observed data and, in the form of genetic
programming, they can also evolve the model itself, also known as
symbolic regression. The latter is still underdeveloped, although it has
a high potential interest, for ecological modelling in particular
(Cardoso et al. , 2020).
EAs’ approach makes them particularly well-suited for both complex
scenarios with an unknown number of optimal solutions and
computationally intensive ones (i.e. a lot of distinct independent
agents). For example, Zheng et al. (2020) used an ant-miner
algorithm, a GA inspired by the food-searching behaviour in ants
consisting of three steps, (1) rules constructing, (2) rules pruning and
(3) pheromone updating, to generate a set of decision rules that can
predict the risk of forest fires. Similarly, Jia et al. (2019)
used PA-DDS (Pareto archived dynamically dimensioned search), a method
for computationally intensive multi-objective problems, to optimise
management of reservoir operations towards economic and ecological goals
(i.e. to minimise ecological flow deviation square while maximising
power generation) using data on water runoff from reservoirs.
Other population-based models are also good solutions for management
conservation issues. Łopucki & Kiersztyn (2020) is such an example,
using camera-trap data to create a particle swarm optimisation (PSO)
model of the daily activity of the striped field mouse (Apodemus
agrarius ). In a PSO, each particle in the swarm represents an
individual that moves in a turn-based way depending on its current ideal
move and that of its neighbours. These models can then be used to inform
all kinds of species management measures.
Despite these uses, the most common use of evolutionary algorithms is
the genetic algorithm for rule-set production (GARP). GARPs used to be
applied as SDMs before the popularisation of MaxEnt (see section 4.1) in
a similar way: A collection of spatial points with associated
environmental variables (water vapour pressure, elevation etc.) is used
to iteratively produce and test a set of rules that is then applied back
to the geospatial information to predict the distribution of a species
(Arriaga et al. , 2004; Wang & Wang, 2006).