Summary

Both taxonomy and barcoding are actively digitisng the living world. The description of new animal taxa is essentially proceeding at a constant rate, generating a steadily growing legacy of taxonomic literature into which digitisation has made modest inroads. In contrast, sequence databases as a whole are growing exponentially, although barcode growth is more modest. Nucleotide sequences are "born digital" and readily computable, for example they can be clustered into BINs of similar sequences, or phylogenies of the type shown in Fig. 6. Given the obvious overlap between the goals of classical taxonomy and barcodes, the lack of digital overlap between these two endeavours is disconcerting. Many barcodes lack taxonomic names ("dark taxa"), and much of the primary taxonomic literature has not been digitised ("dark texts"). Integrating barcodes and taxonomy at scale is going to be significant challenge, as indeed will be integrating barcodes into mainstream sequence databases. Mapping between databases using taxonomic names seems the obvious approach, but the abundance of dark taxa shows this has not been entirely successful. Alternatives such as integration via specimens show promise, but are hampered by the lack of stable identifiers. If we are to make process the stubborn problem of the lack of unique, persistent identifiers, and cross links between those identifiers needs to be tackled in earnest \cite{Page_2008}.

As a postscript, in writing this opinion piece, I have had to write custom scripts to query various databases in an ad hoc manner, trying to extract and assemble information that gives insight into the current state of biodiversity digitisation. For these analyses and visualisations to have broader utility it would be desirable to have some way of consistently and automatically doing these analyses, in effect creating a dashboard of digitisation that would enable us to not only see where we are as a field, but also suggest directions in which we could be heading.