Roderic Page edited untitled.md  over 8 years ago

Commit id: 2652b1147fe2cc3199b6e5698cd13544b15facca

deletions | additions      

       

## Taxonomy  Among the many challenges faced by taxonomy is the difficulty of estimating the size of the task it faces. Estimates of the number of species on Earth are uncertain and inconsistent, and show no signs of converging \cite{Caley_2014}. Some estimates based on models of taxonomic effort suggest that two-thirds of all species have already been described \cite{Costello_2011}, a conclusion \cite{Costello_2011}. Analyses  that might strike DNA barcode researchers use the number of authors per species description  as somewhat optimistic [http://doi.org/10.1038/ncomms1095]. a proxy for effort \cite{Joppa_2012} ignore the global trend for an increasing number of authors per paper \cite{Aboukhalil_2014}, and assume that the effort required per description has remained constant over time. An alternative interpretation is that the quality of taxonomic description is increasing over time \cite{Sangster_2014}, of which \cite{Stoev_2013} is an extreme example.  Currently we lack a comprehensive, global index of species descriptions. For zoology the nearest we have in the Index of Organism Names (ION), which is based on Zoological Record. Data from ION is being cleaned and being augmented in BioNames \cite{Page_2013}. Fig x shows the numbers of new taxa covered by the ICZN (animals plus some protozoan groups) that have been described each year, based on data from ION. These data show an increase in numbers with dips around the times of the two World Wars, followed by an essentially constant number each year since the mid-twentieth century. The pattern in individual groups may vary considerably. For most of the taxa analysed by \cite{Joppa_2011} the numbers of new species described per year are increasing, but other taxonomic groups are essentially static or in decline.  Rather than try and estimate an unknown (the number of species), where I species remaining to be described), we can instead  focus on what we know, that is, the taxonomic literature representing the output of generations of taxonomists. Pentcheff argues \cite{Pentcheff_2010} that the The  rate of progress in biodiversity research is controlled by two factors, the speed with which we can discover and describe biodiversity, and the speed with which we can communicate that information. information \cite{Pentcheff_2010}.  Unlike most biological disciplines, the entire corpus of taxonomic literature since the mid 18th century remains a vital resource for current day research. In this way taxonomy is similar to the digital humanities, where we have not just "big data" but "long data" [978-1594632907]. Not only is this because of the rules of nomenclature that dictate (with some exceptions) that the name to use for a species is the oldest one published, it is also because of the "long tail" effect - for a few species we know a great deal, but for most species the entire sum of our knowledge may reside in the primary taxonomic literature. Digitisation is one step towards making that information available, Many commercial publishers have, on the face of it, done the taxonomic community a great service by digitising whole back catalogues of relatively obscure journals. However, digitisation is not the same as access, and many commercial publishers keep this scanned literature behind a paywall. In some fields, legal issues around access have been side-stepped by constructing a "shadow" dataset. For example, by extracting n-grams (phrases comprising n words) from Google Books it is possible to create a data set that still contains valuable information without exposing the full text \cite{Michel_2010}. But for taxonomic work, there does not seem to be an obvious way to extract a shadow. Agosti and colleagues [10.1186/1756-0500-2-53, 10.3897/zookeys.414.7717] have explored ways to extract core facts from the literature and repurpose these without violating copyright, though how much of their conclusions can be generalised across different national and international legal systems is unclear.