1. Introduction

In October 2010, during the 10th meeting of the parties of the Convention on Biological Diversity (CBD), a crucial strategic plan for biodiversity was approved. The widely known Aichi Biodiversity Targets are a set of timed but unenforced goals for the conservation of species and ecosystems, most notably target 12: Preventing the extinction of known threatened species and the improvement or sustainment of the conservation status of these species, particularly of those most in decline, by 2020 (CBD, 2021). These targets are still relevant and continue being used in the post-2020 CBD framework, with several opinions noting that they should be kept as the basis for developing any new targets, with minimum changes (CBD, 2021). The complexity of the goal is such that even tracking these changes has proven difficult if not impossible for most taxa. New approaches are required for these goals to be achieved in the near future. Given its potential, Vinuesa et al. (2020) argue that the use of machine learning can help achieve 134 targets of the UN 2030 Agenda for Sustainable Development goals, a considerable contribution despite potential hindrance in 59 other targets, should the technology go unregulated.
Machine learning (ML), often used synonymously with artificial intelligence, could be defined as computational systems which improve their behaviour and responses in light of obtained results in a feedback loop. They often achieve humanlike or superhuman performance in a variety of problems, and while closely linked, these concepts do not entirely overlap. Rather, ML is an interdisciplinary field that borrows methodologies and concepts from other areas, making a comprehensive definition challenging. ‘When fundraising, it’s AI; when recruiting, it’s machine learning; when implementing, it’s linear regression’ , as is often joked. Regardless, even if all definitions are imperfect, some are useful, such as models. The origin of the term ‘machine learning’ is usually attributed to Arthur Samuel, an American data scientist for IBM (International Business Machines Corporation) in the 1960s. For a general-purpose definition, one may well use that by the data scientist Tom Mitchell, who defined ‘a well-posed learning problem’, or a ‘machine learning’ problem, as when ‘a computer program is said to learn from experienceE with respect to some task T and some performance measureP , if its performance on T , as measured by P , improves with experience E ’ (Mitchell, 1997). Considering this, several fields then emerge within machine learning, with different goals and using different data types. Some common examples are natural language processing, image processing and data mining, which are particular flavours of machine learning focused on the automatic or semi-automatic process of discovering patterns in data (Witten et al. , 2016). Extraction of context-dependent relational data, facial and character recognition, and data clustering analyses are only a few of the possible applications. The possibilities are incredibly vast, and ML has, with its different approaches, been applied to a variety of research problems. Additionally, it is now more accessible than ever before. While ML approaches often use methodologies developed on a theoretical level decades ago, sufficient technological progress has allowed the scientific community at large to access user-friendly software and vast amounts of processing power, a necessity for many ML tasks.
The advent of big and open data relevant for species conservation has made a lot of research areas both easily and globally accessible (Ball-Damerow, 2019; Feng et al. , 2022). Unstructured data of both species occurrences (e.g. GBIF, 2022; iNaturalist, 2022) and their traits (e.g. TryDatabase, 2022; World Spider Trait Database, 2022) is growing exponentially. Human infrastructure, such as powerlines, roads, service corridors, urban areas and more, is readily available on platforms such as GHSL (https://ghsl.jrc.ec.europa.eu/) and could provide data for individual mortality, fragmentation etc. (Bradley 2010; Andrew et al. , 2012). Climate data is also more easily incorporated in ML models than ever before, with products like Merraclim, WorldClim or CHELSA (Vega, Pertierra & Olalla-Tárraga, 2017; Fick & Hijmans, 2017; Karger et al. , 2017) making vast quantities of spatial climate data available as practical, high-resolution raster layers. Advancements in climate science and modelling have also made it possible to more accurately forecast long-term range shifts in both endemic and invasive species (Lu et al. , 2021; Naudiyal et al. , 2021; J. Zhang et al. , 2021). All these big data examples are but the tip of the iceberg, as there are many more variables with readily available data (Carneiro Freireet al. , 2016; Meijer et al. , 2018; Corbane et al. , 2019).
The power of ML, combined with vast amounts of data, makes it easy to understand its growing popularity in species conservation. From decision trees to artificial neural networks, multiple ML methods have been applied with variable success; a trend that can only grow in the future, as we accumulate more data and try to answer increasingly complex questions. The growth in the use of multiple ML methods has been so fast that it is hard for a conservation researcher to keep up with all the possibilities, advantages and limitations of each method to answer a particular question. Towards this end, multiple researchers have authored reviews on ML use in conservation and related fields. However, these have focused so far on specific aspects of conservation analysis, such as red-listing (Cazalis et al. , 2022); preliminary tasks common for many conservation analyses, like species distribution modelling (SDM) (Elith et al. , 2006); specific groups of methodologies like Bayesian methods or neural networks (McCarthyet al. , 2005; Stupariu et al., 2022); or have otherwise constituted opinion papers or non-systematic reviews (Thessen et al. , 2016; Liu et al. , 2018; Christin et al. , 2019; Pichler & Hartig, 2022). A systematic review of the uses, advantages and disadvantages of each method and application in species conservation is still lacking. In this review, we systematically summarise the various uses and methodologies of machine learning in the context of species threat and conservation studies, highlighting current trends (2011–2021), and report how and when different methods are used in different key problems.