Roderic Page edited Specimens_Integrating_databases_using_specimens__.md  over 8 years ago

Commit id: 264e733d3dd1553f34f9fa4991f3902d9499a816

deletions | additions      

       

## Specimens  Integrating databases using specimens is attractive, but not without its own set of problems. The biodiversity informatics community has yet to standardise identifiers for specimens, despite numerous efforts \cite{Guralnick_2015}, consequently there may be little apparent overlap between specimen identifiers in different databases \cite{Guralnick_2014}. As an example, despite the limited sharing of data between BOLD and GBIF, there are already barcoded specimens in GBIF. To illustrate, consider the DNA barcode GWORH520-09 from sample "BC ZSM Lep 10234". GBIF doesn't have this record from BOLD, but it does have the specimen BC ZSM Lep 10234 \citep[provided by the host institution ]{ebd55b32-cd68-46b4-85e2-105de99fecc8}. The DNA barcode from this specimen is also in GenBank, and because that record is georeferenced it has been ingested by GBIF as part of the Geographically tagged INSDC sequences dataset \cite{3c8d6ba3-d69e-468a-81e2-61689418f59e}. Hence, GBIF has duplicate records for a barcoded moth, neither provided directly by BOLD (Fig. 7). Merging and de-duplicating specimen-based records is going to be a significant challenge for global aggregators such as GBIF.