Roderic Page edited Linking_by_name_will_struggle__.md  over 8 years ago

Commit id: 814ee475eb8575cd835bf9d5add49d684dce0e61

deletions | additions      

       

Typically integration across biodiversity databases is achieved using taxonomic names \cite{Patterson_2010}, but the rise of dark taxa makes this problematic for an increasing fraction of sequence-based data. Even if we have names, these need not always mean the same thing \cite{Kennedy_2003}. As an example, Fig x shows the distribution of the lizard _Morethia obscura_ from GBIF. For comparison, Fig. y shows a geophylogeny \cite{Page_2015} for DNA barcodes for Morethia obscura from BOLD, which reveals considerable phylogeographic structure. These barcodes comprise x BINs, implying that "Morethia obscura" may comprise multiple species. it is not at all clear that _Morethia obscura_ in these two databases represents the same thing.  What GBIF and BOLD do share in this example are specimens. The DNA barcode WAMMS011-10 WAMMS012-10  comes from the specimen WAMR127632, WAMR127637,  which is also in GBIF (http://gbif.org/occurrence/691832244). (http://gbif.org/occurrence/691832260).  So, one way to integrate these two databases is at the level of specimens rather than taxa. Of course, this need not always be straightforward. The biodiversity informatics community has yet to standardise identifiers for specimens, despite numerous efforts \cite{Guralnick_2015}, consequently there may be little apparent overlap between specimens identifiers in different databases \cite{Guralnick_2014}.