4. Discussion

Most major classes of machine learning methods have been used across all fields of species threat and conservation analyses. Furthermore, we found usage of these methods to be consistent with previous reviews of the use of machine learning in conservation (Thessen, 2016; Liu et al. , 2021). From species distribution models, which are the basis for further analyses, to advising on the best options for preserving genetic diversity, machine learning is increasingly used, as methods become simpler to use and implement for multiple purposes. Yet, many methods have been more commonly used for specific objectives, as they are either widely studied regarding specific problems, or because they have been found to be particularly effective. Here, we attempt to detail the use, advantages and disadvantages of each method while providing some examples that attempt to cover a large spectrum of uses. The order of methods follows their popularity in publications as measured by frequency.

4.1 MaxEnt

MaxEnt models have been extensively used for feature (e.g. species) distribution in space, as they can cope with a large number of covariates and with small sample sizes; in the latter case through their regularised versions (Mohri et al. , 2018). As MaxEnt has been extensively researched with background data, its popularity is mostly due to its usage in species distribution models (SDM); estimating the species distributions being the first step in many conservation exercises. Before MaxEnt’s popularity, other ML methods, such as the genetic algorithm for rule-set production (GARP), were used for SDMs, but their use has decreased since then (see section 4.7). MaxEnt’s easy-to-use GUI (graphical user interface) and good performance when compared to similarly purposed models (Elith et al. , 2006) contributed to its popularity. Therefore, it is natural that MaxEnt was found to be the most commonly used ML method in this review.
Using MaxEnt, with predicted habitat suitability (often interpreted as potential distribution) of a species of conservation concern as its response variable, is by far the most common use of the method. Studies employing MaxEnt this way often use a selection of both environmental variables and stressors. Integrative studies potentially include distribution data on invasive species and other stressors, such as powerlines, roads and agricultural development (Bradley, 2010).
Modelling the distribution of invasive species and pathogens is also common practice. One such example, Azzurro et al. (2013), created a species distribution model for the invasive bluespotted cornetfish (Fistularia commersonii ) using human population density (among other abiotic variables) as predictor. The response variable can also be something unrelated to species distributions. Focusing on the indirect impacts of agriculture, Mateo-Tomas et al. (2012) used MaxEnt to model the number of poisoning events driven by livestock presence due to predation risk. The baseline is that almost all response variables that could reasonably be expected to follow the maximum entropy principle can be modelled using MaxEnt.

4.2 Bayesian methods

Bayesian methods are capable of handling problems with small datasets, provided there is prior information on the problem. They can also provide information on credible intervals of unobserved parameters (Murphy, 2012). Bayesian methods were found to be the second most popular ML option, likely due to their capability of incorporating prior models. Of all Bayesian methods, Bayesian belief networks (BBN), Bayesian regression models (GLM, GAM etc.), Bayesian clustering analyses, and MCMC are the most common methods found in the literature. They are most common in management and experimental settings, in archetypal Before-After-Control-Impact (BACI) studies where a model is usually created and then parameterised through MCMC.
Bayesian methods can be used to directly model species occurrences based on stressors. Shokri et al. (2021) performed a comprehensive analysis of the spatial patterns of vulnerability and conservation of the Caspian red deer (Cervus elaphus maral ), including attempts to model (Bayesian GLM, MCMC) red deer occurrences with the number of local ranger stations as an explanatory variable, hypothesising this to act as a proxy for dissuasive measures on illegal hunting. Similarly, Christie et al. (2015) used Bayesian mixed models and MCMC to study the effects of oil development (oil wells and road density) in the density of pronghorns (Antilocapra americana ) in the United States. However, the explanatory variable does not necessarily need to be an occurrence metric. Wang et al. (2015) used Bayesian vector autoregression to model the relationship between economic development and an environmental quality index. Additionally, Bayesian methodologies are used in geographic information system (GIS) applications, with empirical Bayesian kriging being used to interpolate overall groundwater quality, based on sample water quality (Hossain & Patra, 2020).
Modelling the relationship between phenotypic traits and ecosystem dynamics and stressors is most commonly done with Bayesian models – an aspect that should increase in popularity with the growing availability of trait data. Primack et al. (2009) developed a Bayesian hierarchical model parameterised with Gibbs sampling to study the interactions of temperature, site effects and latitude on phenological attributes, such as the first flowering date of an apricot tree species and the arrival of a migratory swallow. Likewise, in a controlled environment, Tucker et al. (2012) used hierarchical Bayesian modelling to compare the short-term and seasonal responses in soil respiration under changing thermal conditions and a variable substrate (with or without a simple sugar, dextrose). These changes to soil respiration could in turn, at a sufficient scale, deeply affect the global carbon cycle and climate at large.
Bayesian belief networks are the natural choice for many conservation studies, including expert opinion, because they take advantage of Bayesian methods’ small-dataset capabilities and condense expert opinion and bibliographic knowledge into a simple statistical model. They do so by transparently relating, through conditional probability tables, input variables which are non-measurable or otherwise abstract independent variables (e.g. political support for conservation) with proxy variables (e.g. corruption index, lawmaker survey data). It is then up to the authors to make an educated guess as to the priors of these variables and the relationships between the measurable proxies and the input variables (Landuyt et al. , 2013). In an archetypical example, Namkhan et al. (2020) sought to model the vulnerability of lowland forest habitats in mainland Asia and incorporated several input variables pertaining to biological resource use. These include local hunting (whose proxy variables are border distance and road presence, among others) and protection level (whose proxy variables are protected area type and presence of charismatic species, among others).
Smith et al. (2007) modelled habitat suitability for the Near Threatened Julia Creek dunnart (Sminthopsis douglasi) with a variety of GIS-derived proxy variables, such as distance to water and density of an invasive species of tree, Acacia nilotica (L.). These then informed a set of key input variables, such as grazing pressure and density of soil cracks. In another study, given the lack of information on landscape-scale risks apart from those from the invasiveMimosa pigra (L.), for which experimental data already existed, Bayliss et al. (2012) opted to perform an ERA (ecological risk assessment) for a floodplain downstream of an Australian uranium mine as a BBN. The overall risk is composed of a mine site ERA and a landscape ERA, both with their sets of input variables (e.g. uranium’s ecological risk and unmanaged fire risk, respectively) and proxy variables (e.g. uranium concentration and fire exposure, respectively).
It should be noted that some authors rely on BBNs less for their practical integration of input and proxy variables, and more for their simplicity and transparency. As an example, Van der Biest et al.(2014) noted the lack of models for ecosystem services (ES) ‘in terms of their supporting systems, namely the biophysical potential for the delivery of services’. As a response, they advanced the EBI (Ecosystem Service Bundle Index), an index coupled with a BBN model that estimates provision of ecosystem services with ES input variables (e.g. food production, wood production) informed by proxy variables (e.g. land use class, soil texture class) sourced with spatially explicit data. Lastly, Bondé et al. (2020) used a BBN on an arguably entirely theoretical level. They broadly identified three input variables (climate change, overexploitation, and land use change) influenced in turn by six input variables based on policy changes (e.g. protected area expansion, promotion of agroforestry). The model was then used in conjunction with several policy scenarios (e.g. ‘business-as-usual’, ‘agroforestry and fair trade scenario’) to predict the trend of shea tree, Vitellaria paradoxa (C. F. Gaertn) abundance.
Finally, in the specific context of conservation analysis, Bayesian methods are also very frequently used to analyse genetic data. Such approaches can be used, for example, to determine the origin of specific stressors or to study genetic diversity loss. Manel et al. (2002) exemplify this possibility by accurately determining the origin of trafficked individuals by running a Bayesian clustering analysis with MCMC on sequence data of particularly polymorphic DNA markers, shedding light on potential trafficking pathways in a way that is reminiscent of product authenticity problems (Montowska et al. , 2010). Similarly, in Oliveira et al. (2008), the same methodology was used to determine the hybridisation between wild and domestic cats in Portugal. The same can also be done in management scenarios. As an example, Barilani et al. (2007) assessed potential progressive introgression in a partridge species caused by reared individuals supplementing a population in decline due to overharvesting. Likewise, Hone et al. (2010) examined the maximum proportion of a population that can be removed to stop population growth in Australian and New Zealand mammal species, and its implications on management (and harvest).

4.3 Ensembles

Ensembles are powerful and versatile models with several advantages. They are often capable of handling non-linear responses and large datasets with many response variables. They also have a reduced risk of overfitting if adequate measures are taken (Cutler et al. , 2007). Finally, they have the advantage of increasing the accuracy of the model compared to that of each of its component models. The fact that the most popular form is random forests stems from the simplicity of the component models, decision trees. However, their interpretability is mostly limited to the heuristics that can be applied to black-box models (Murphy, 2012). Also, for SDM, they require species presence–absence data, which might be limiting.
Boosted regression trees (BRT) and random forests (RF) are by far the most popular types of ensembles in conservation biology. They are good ‘all-purpose’ choices in classification and regression problems. Johnstone et al. (2010) used BRTs to determine how variations in pre-fire vegetation, fire effects and spatial and environmental variables affected post-fire regeneration in boreal forests by separately modelling three different response variables: seedling densities of the black spruce, Picea mariana (Mill.) B.S.P, seedling density (including resprouts) of two deciduous tree species, the trembling aspen, Populus tremuloides (Michx.) and Alaskan paper birch, Betula neoalaskana (Sarg.), and the proportion of post-fire seedlings that were spruce. Young et al. (2017) used BRTs to model fire occurrence in boreal regions with 60 years of landscape variables (habitat distribution, topography etc.) and climatic variables (potential evapotranspiration etc.). Coutts et al.(2011) use a BRT to examine the factors conditioning the spread of invasive species while considering different management scenarios. As for RFs, Barros & Elkin (2021) attempted to solve a lasting challenge of managing old-growth forest (i.e. at an advanced development stage) by predicting and developing an old-growth index. Old-growth forests are functionally and ecologically distinct, and establishing such an index based on structural attributes (i.e. basal area of large trees, vertical variability) allows for old-growth mapping at fine ecological and spatial scales. Peciña et al. (2021) also used RFs to predict above-ground biomass in coastal meadows at a fine spatial resolution, and then statistically assessed the effects of management on sward structure using historical management data.
Ensembles were also used to estimate the distribution of species (i.e. SDM) or other features. Sabatini et al. (2017) modelled potential undetected primary forests in Europe by training a BRT model with a presence–absence map of primary forests and several explanatory variables of biogeographic and socio-economic nature, among which was travel time to the nearest city. Similarly, Catford et al. (2011) used a BRT to model the distribution of exotic species dependent on multiple environmental and anthropogenic variables.
Lastly, RFs’ capacity to handle many-featured datasets has been widely used in studies dealing with pollution. These make frequent use of arrays with multiple contaminant concentrations as explanatory variables. Oliver et al. (2017) compare models of long-term change in lakes through lake-level or region-level drivers based on total nitrogen (TN), total phosphorus (TP), stoichiometry (TN:TP) and chlorophyll. Li et al. (2020) linked pollution in lakes to hydrological features in affluents. H. Zhang et al. (2021) linked individual heavy metal contaminants to their potential sources by modelling their concentration with different soil features. Zhang & Vincent (2019) predict the conservation status of endangered species by linking it with pollutants and other stressors. Molnár et al.(2020) modelled the air pollution tolerance index (APTI), a measure of a plant’s ability to counter the effects of pollution, as a function of particulate matter concentration, city population and land use classes. In all cases, by using methods of estimating variable importance, authors explore the contribution of each explanatory variable towards an end prediction.

4.4 Artificial neural networks

Artificial neural networks (ANN) are a very powerful family of models applied to a variety of problems. Deep ANN models have shown a capability to obtain very good results in complex problems, at the cost of needing a large amount of samples due to the large amount of parameters to learn. However, they are black-box models, and therefore the interpretability of results is limited to the use of a few heuristics (Goodfellow et al. , 2016). This has perhaps limited their application to conservation science.
ANNs’ black-box characteristic is not an impairment when dealing with threats and concepts whose theoretical understanding is secondary to their practical prediction (e.g. pollution, weather and geological events). Coste et al. (2009) used presence–absence data of macrophyte species to predict water quality classes defined by hydrological features. Additionally, although a variety of models have been employed to predict flooding events, ANNs are by far the most popular (see Mosavi et al. , 2018, for a review on this topic). Likewise, a deep ANN achieves the best accuracy in landslide modelling when compared to other methods (i.e. SVM, MLP, RF) (Bui et al. , 2020).
Neural networks have been extensively used for image recognition and automated identification of species from pictures or videos. These data types are extremely complex in their nature, and ANNs have the capability to detect common patterns in images after extensive training, even if the reasoning behind each classification is often difficult to determine. One initiative to prevent roadkill, ROOD (kangaROO roaD), uses the object recognition software YOLOv3 (You Only Look Once) (Redmonet al. , 2018), which employs a convolutional neural network to recognise carcasses captured on a car-mounted camera device (Yi & Khot, 2020). Spyromitros-Xioufis et al. (2018) review efforts to estimate air quality based on images of the sky obtained from ground level. Khalighifar et al. (2021), taking advantage of the image processing abilities of a CNN, applied it to sonograms of frog calls to identify the corresponding species. With the advent of big data in ecology and conservation, current efforts in remote sensing (i.e. detecting and monitoring physical characteristics by measuring reflected and emitted radiation at a distance) of species and automatic taxonomic classification are also based on increasingly powerful and complex neural networks (Weinstein, 2017; Fairbrass et al. , 2018; Valanet al. , 2019).
In general, complex time-series data is often plagued with noise derived from natural population or community fluctuations that confound the trends one is trying to model. ANNs are flexible enough to allow identifying patterns of interest beyond such ‘noise’ and predicting future trends, in particular when using recurrent connections (RNN). RNNs can be used to process sequential and time-series data, an ability used to determine the drivers of past trends and to forecast future trends. See Christin et al. (2019) for a review on this topic and Capinha et al. (2021) for a set of case studies.

4.5 Decision trees

Decision trees are one of the most simple and explainable models of machine learning. The user can easily extract rules that the model obtains from training data. They are, however, not robust to updates in the dataset, since the structure of the decision tree may change with the addition (or removal) of a few samples (Mohri et al. , 2018), and thus have never gained popularity as a consequence. Regardless, DTs are majorly used in decision and management studies as an extremely simple and intuitive model that can still be potentially more advantageous in implementation and outreach than more complex models.
Pyšek et al. (2012) used a classification tree constructed with the CART algorithm (Breiman et al. , 1984) to make a variety of conservation predictions. From existing trait data on invasive plant species (e.g. height, pollination, toxicity), the DTs generated predictions of significant impacts on a variety of elements: species and communities of resident animals and plants, soil characteristics and fire regimes. Similarly, Křivánek, et al. (2006) tested invasive species risk assessment schemes based on DTs using binary trait nodes (e.g. ‘Is it an agricultural weed elsewhere?’, ‘Is it unpalatable to grazers?’) to produce a decision to accept, suggest or reject a given species as non-invasive. Canessa et al. (2020) created a DT using expert judgement to assess management alternatives in reintroduced populations of the critically endangered regent honeyeater,Anthochaera phrygia .
Finally, DTs have, as many other ML methods, been used to model species distributions. As an example, Debeljak et al. (2015) used DTs with environmental data inputs (e.g. slope, landscape category) to model the habitat of the black poplar [Populus nigra (L.)], a riparian species that is threatened by habitat loss.

4.6 Support vector machines

Support vector machines can be used in classification and regression problems, with good results in moderate-sized problems. They do not scale well in multi-class problems with a large number of features and examples. Also, they cannot provide probabilistic evaluation of the classification produced since they are geometrical models (Murphy, 2012). These characteristics, along with the requirement of presence–absence data and competition with other techniques occupying its niche (such as ensemble methods), justify its scarce use as a sole method for conservation problems.
The notable exception to the above is in problems that make use of spatial data. Liu et al. (2018) use SVMs to model urban expansion as a function of 11 spectral variables (e.g. built-up index, soil brightness, wetness etc.), which they then couple with a least squares regression analysis between the generated map and geospatial data on socio-economic factors. In a less conventional example that stretches the limitations of the model, Gallardo, Errea & Aldridge (2013) assessed the habitat suitability of a problematic species based on environmental factors (e.g. precipitation, maximum temperature of warmest month etc.). In this scenario, pseudo-absence points were generated to run the model with presence-only data, opposed to the normal presence–absence data. Furthermore, in order to output habitat suitability, regression was accomplished by implementing Platt’s posteriori probabilities. This is a method similar to fitting a sigmoid to the normal SVM decision values, here used to produce class probabilities in a markedly non-probabilistic ML method. Conservation analysis might benefit from these workarounds, applicable to most classification methods when fully explored.
SVMs have been used in multiple cases of environmental monitoring. Wind turbines can have a sizable effect on certain bat and bird species (Stevens et al. , 2013; Davy et al. , 2021), and a preventative stoppage of the blades could significantly decrease the number of individuals dying in mid-air collisions with them. Hu & Albertani (2019) created an automatic collision detection system for wind turbine blades, which uses automated detection algorithms based on SVMs. In another study, Neill et al. (2018) used SVMs to predict the environmental status of an agroecosystem from environmental factors (seasonality) as well as biological and chemical indicators related to beehives (population mean, count of pesticides).

4.7 Evolutionary algorithms

Evolutionary algorithms are mostly useful for optimisation problems where they are recognised for their robustness, providing high-quality results in many different types of problems (Floreano & Mattiussi, 2008). This means they can be used to optimise the parameters of a defined model fitting it to observed data and, in the form of genetic programming, they can also evolve the model itself, also known as symbolic regression. The latter is still underdeveloped, although it has a high potential interest, for ecological modelling in particular (Cardoso et al. , 2020).
EAs’ approach makes them particularly well-suited for both complex scenarios with an unknown number of optimal solutions and computationally intensive ones (i.e. a lot of distinct independent agents). For example, Zheng et al. (2020) used an ant-miner algorithm, a GA inspired by the food-searching behaviour in ants consisting of three steps, (1) rules constructing, (2) rules pruning and (3) pheromone updating, to generate a set of decision rules that can predict the risk of forest fires. Similarly, Jia et al. (2019) used PA-DDS (Pareto archived dynamically dimensioned search), a method for computationally intensive multi-objective problems, to optimise management of reservoir operations towards economic and ecological goals (i.e. to minimise ecological flow deviation square while maximising power generation) using data on water runoff from reservoirs.
Other population-based models are also good solutions for management conservation issues. Łopucki & Kiersztyn (2020) is such an example, using camera-trap data to create a particle swarm optimisation (PSO) model of the daily activity of the striped field mouse (Apodemus agrarius ). In a PSO, each particle in the swarm represents an individual that moves in a turn-based way depending on its current ideal move and that of its neighbours. These models can then be used to inform all kinds of species management measures.
Despite these uses, the most common use of evolutionary algorithms is the genetic algorithm for rule-set production (GARP). GARPs used to be applied as SDMs before the popularisation of MaxEnt (see section 4.1) in a similar way: A collection of spatial points with associated environmental variables (water vapour pressure, elevation etc.) is used to iteratively produce and test a set of rules that is then applied back to the geospatial information to predict the distribution of a species (Arriaga et al. , 2004; Wang & Wang, 2006).