Roderic Page edited Integrating_databases_using_specimens_is__.md  over 8 years ago

Commit id: 97891d057273a9a01d207303b6e52ea194415246

deletions | additions      

       

## Specimens  Integrating databases using specimens is attractive, but not without its own set of problems. The biodiversity informatics community has yet to standardise identifiers for specimens, despite numerous efforts \cite{Guralnick_2015}, consequently there may be little apparent overlap between specimen identifiers in different databases \cite{Guralnick_2014}. As an example, despite the limited sharing of data between BOLD and GBIF, there are already barcoded specimens in GBIF. To illustrate, consider the DNA barcode GWORH520-09 from sample "BC ZSM Lep 10234". GBIF doesn't have this record from BOLD, but it does have the specimen BC ZSM Lep 10234 \cite[provided \citep[provided  by the host institution ]{ebd55b32-cd68-46b4-85e2-105de99fecc8}. Furthermore, the The  DNA barcode from this specimen is also in GenBank, and because that record is georeferenced it has been ingested by GBIF as part of the Geographically tagged INSDC sequences dataset \cite{3c8d6ba3-d69e-468a-81e2-61689418f59e}. Hence, GBIF has duplicate records for a barcoded moth, neither provided directly by BOLD (Fig. 7). Merging and de-duplicating specimen-based records is going to be a significant challenge for global aggregators such as GBIF.