Roderic Page edited Dark_taxa_As_desirable_as__.md  over 8 years ago

Commit id: ad659b5d7cd5a9956158241d3a2779ef1035ceae

deletions | additions      

       

## Dark taxa  As desirable as data sharing is, it is not without complications. In 2011 I coined the phrase "dark taxa" \cite{[http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html, \cite[http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html,  see also]Parr_2012}to also ]{Parr_2012} to  refer to species in GenBank that lacked formal scientific names. Typically they will have a name that comprises a genus name and some combination of letters and numbers to make the name unique within GenBank (e.g., a specimen code or the first letter of the last names of the researchers that deposited the sequence). For this paper I've updated the analysis to 2015 include sequences published up to the time of writing  (Fig. zzz). 5).  The pattern shown in Fig. zzz 5  likely reflects a combination of processes. If most of the taxa being added to GenBank represent species that have already been described, then the rate at which taxa can be identified (either by taxonomists or by researchers using their outputs, such as keys) is being outstripped by the pace of sequencing. Alternatively, dark taxa may represent unknown species, but we lack taxonomists capable of recognising the taxa as new (and formally describing them). If taxonomic capacity is a limiting factor then we would expect a gradual decline in percentage of named taxa, which is the  background pattern. The growth of dark taxa might also reflect changing practices of molecular workers, for example in DNA barcoding where large numbers of specimens are sequenced and deposited into GenBank labelled with specimen codes rather than taxonomic names. Indeed, the dramatic increase in the numbers of dark taxa in 2010 is mostly due to sequences from the Barcode of Life Data Systems (BOLD) project being added, mostly associated with taxa whose names start with added (recognised by  the prefix "BOLD". "BOLD").  Even if we allow for the import of unidentified BOLD sequences as a one-off event, at present less than half the new invertbrate taxa being added to GenBank have been identified to species level.