Roderic Page edited Taxonomy.md  over 8 years ago

Commit id: 360afb446d841b5996f98bf90f246c8110a4d15d

deletions | additions      

       

Digitisation is one step towards making that biodiversity information available. Many commercial publishers have, on the face of it, done the taxonomic community a great service by digitising whole back catalogues of relatively obscure journals. However, digitisation is not the same as access, and many commercial publishers keep this scanned literature behind a paywall. In some fields, legal issues around access have been side-stepped by constructing a "shadow" dataset that summarises key features of the data while still restricting access to the data itself. For example, by extracting _n_-grams (phrases comprising _n_ words) from Google Books it is possible to create a data set that still contains valuable information without exposing the full text \cite{Michel_2010}. But for taxonomic work, there does not seem to be an obvious way to extract a shadow. Agosti and colleagues \cite{Agosti_2009, Patterson_2014} have explored ways to extract core facts from the literature and re-purpose these without violating copyright, though how much of their conclusions can be generalised across different national and international legal systems is unclear.  ##Open access to literature  Apart from commercial digitisation of the scientific literature, two other developments are accelerating access to taxonomic information. The first is the rise of open access publishing, notably journals such as _ZooKeys_ that support sophisticated mark of concepts in the text \cite{Penev_2010}. This is increasing the number of recently-described species that are published in machine-readable form that can be subject to further processing \cite{Miller_2015}. At the same time, the [Biodiversity Heritage Library (BHL)](http://www.biodiversitylibrary.org) \cite{Gwinn_2009} has embarked on large-scale digitisation of legacy taxonomic literature. Although initially focussing on out of copyright literature (i.e., pre-1923 in the United States), BHL is increasingly getting permission from copyright holders to scan more recent literature as well. Coupled with tools to locate and extract articles within the scanned volumes \cite{Page_2011}, BHL is fast becoming the largest open access archive of biodiversity literature.  To quantify the extent to which the taxonomic literature has been digitised, Fig zzz shows