Roderic Page edited Linking_by_name_will_struggle__.md  over 8 years ago

Commit id: 9d2dbe860ab558890fbf5007dc34a53f523d6778

deletions | additions      

       

Linking by name will struggle with dark taxa, and taxa need not be the same across databases (lizards as example)  Link via specimens, challenge is integrating, cite PLoS paper on triplets.  Linking across biodiversity datasets using specimens comes with its own challenges.  Major challenge is to integarte across literature, taxa, genomes and species  Typically integration across biodiversity databases is achieved using taxonomic names \cite{Patterson_2010}, but the rise of dark taxa makes this problematic for an increasing fraction of sequence-based data. Even if we have names, these need not always mean the same thing \cite{Kennedy_2003}. As an example, Fig x shows the distribution of the lizard _Morethia obscura_ from GBIF. For comparison, Fig. y shows a geophylogeny \cite{Page_2015} for DNA barcodes for Morethia obscura from BOLD which reveal considerable phylogenetic structure within "Morethia obscura", which is reflected in specimens of this species being assigned to several distinct BINs implying that "Morethia obscura" comprises more than one species.  Although GBIF and BOLD present rather different views of the "same" species, Figs x and y are to some extent based on the same specimens. For example, DNA barcode WAMMS012-10 was obtained from specimen WAMR127637, which also occurs in GBIF (as occurrence http://gbif.org/occurrence/691832260). Because the taxonomic concepts in GBIF and BOLD are explicitly defined with respect to sets of specimens we can directly compare them, rather than rely on the possibly erroneous assumption that a given taxonomic name means the same thing in the two databases.  Integrating databases using specimens is attractive, but not without its own set of issues. The biodiversity informatics community has yet to standardise identifiers for specimens, despite numerous efforts \cite{Guralnick_2015}, consequently there may be little apparent overlap between specimens identifiers in different databases \cite{Guralnick_2014}. The lack of stable specimen identifiers can have