1.
Introduction
In October 2010, during the 10th meeting of the
parties of the Convention on Biological Diversity (CBD), a crucial
strategic plan for biodiversity was approved. The widely known Aichi
Biodiversity Targets are a set of timed but unenforced goals for the
conservation of species and ecosystems, most notably target 12:
Preventing the extinction of known threatened species and the
improvement or sustainment of the conservation status of these species,
particularly of those most in decline, by 2020 (CBD, 2021). These
targets are still relevant and continue being used in the post-2020 CBD
framework, with several opinions noting that they should be kept as the
basis for developing any new targets, with minimum changes (CBD, 2021).
The complexity of the goal is such that even tracking these changes has
proven difficult if not impossible for most taxa. New approaches are
required for these goals to be achieved in the near future. Given its
potential, Vinuesa et al. (2020) argue that the use of machine
learning can help achieve 134 targets of the UN 2030 Agenda for
Sustainable Development goals, a considerable contribution despite
potential hindrance in 59 other targets, should the technology go
unregulated.
Machine learning (ML), often used synonymously with artificial
intelligence, could be defined as computational systems which improve
their behaviour and responses in light of obtained results in a feedback
loop. They often achieve humanlike or superhuman performance in a
variety of problems, and while closely linked, these concepts do not
entirely overlap. Rather, ML is an interdisciplinary field that borrows
methodologies and concepts from other areas, making a comprehensive
definition challenging. ‘When fundraising, it’s AI; when
recruiting, it’s machine learning; when implementing, it’s linear
regression’ , as is often joked. Regardless, even if all definitions are
imperfect, some are useful, such as models. The origin of the term
‘machine learning’ is usually attributed to Arthur Samuel, an American
data scientist for IBM (International Business Machines Corporation) in
the 1960s. For a
general-purpose
definition, one may well use that by the data scientist Tom Mitchell,
who defined ‘a well-posed learning problem’, or a ‘machine learning’
problem, as when ‘a computer program is said to learn from experienceE with respect to some task T and some performance measureP , if its performance on T , as measured by P ,
improves with experience E ’ (Mitchell, 1997). Considering this,
several fields then emerge within machine learning, with different goals
and using different data types. Some common examples are natural
language processing, image processing and data mining, which are
particular flavours of machine learning focused on the automatic or
semi-automatic process of discovering patterns in data (Witten et
al. , 2016). Extraction of context-dependent relational data, facial and
character recognition, and data clustering analyses are only a few of
the possible applications. The possibilities are incredibly vast, and ML
has, with its different approaches, been applied to a variety of
research problems. Additionally, it is now more accessible than ever
before. While ML approaches often use methodologies developed on a
theoretical level decades ago, sufficient technological progress has
allowed the scientific community at large to access user-friendly
software and vast amounts of processing power, a necessity for many ML
tasks.
The advent of big and open data relevant for species conservation has
made a lot of research areas both easily and globally accessible
(Ball-Damerow, 2019; Feng et al. , 2022). Unstructured data of
both species occurrences (e.g. GBIF, 2022; iNaturalist, 2022) and their
traits (e.g. TryDatabase, 2022; World Spider Trait Database, 2022) is
growing exponentially. Human infrastructure, such as powerlines, roads,
service corridors, urban areas and more, is readily available on
platforms such as GHSL (https://ghsl.jrc.ec.europa.eu/) and could
provide data for individual mortality, fragmentation etc. (Bradley 2010;
Andrew et al. , 2012). Climate data is also more easily
incorporated in ML models than ever before, with products like
Merraclim, WorldClim or CHELSA (Vega, Pertierra & Olalla-Tárraga, 2017;
Fick & Hijmans, 2017; Karger et al. , 2017) making vast
quantities of spatial climate data available as practical,
high-resolution raster layers. Advancements in climate science and
modelling have also made it possible to more accurately forecast
long-term range shifts in both endemic and invasive species (Lu et
al. , 2021; Naudiyal et al. , 2021; J. Zhang et al. , 2021).
All these big data examples are but the tip of the iceberg, as there are
many more variables with readily available data (Carneiro Freireet al. , 2016; Meijer et al. , 2018; Corbane et al. ,
2019).
The power of ML, combined with vast amounts of data, makes it easy to
understand its growing popularity in species conservation. From decision
trees to artificial neural networks, multiple ML methods have been
applied with variable success; a trend that can only grow in the future,
as we accumulate more data and try to answer increasingly complex
questions. The growth in the use of multiple ML methods has been so fast
that it is hard for a conservation researcher to keep up with all the
possibilities, advantages and limitations of each method to answer a
particular question. Towards this end, multiple researchers have
authored reviews on ML use in conservation and related fields. However,
these have focused so far on specific aspects of conservation analysis,
such as red-listing (Cazalis et al. , 2022); preliminary tasks
common for many conservation analyses, like species distribution
modelling (SDM) (Elith et al. , 2006); specific groups of
methodologies like Bayesian methods or neural networks (McCarthyet al. , 2005; Stupariu et al., 2022); or have otherwise
constituted opinion papers or non-systematic reviews (Thessen et
al. , 2016; Liu et al. , 2018; Christin et al. , 2019;
Pichler & Hartig, 2022). A systematic review of the uses, advantages
and disadvantages of each method and application in species conservation
is still lacking. In this review, we systematically summarise the
various uses and methodologies of machine learning in the context of
species threat and conservation studies, highlighting current trends
(2011–2021), and report how and when different methods are used in
different key problems.