Roderic Page edited Integrating_databases_using_specimens_is__.md  over 8 years ago

Commit id: 7d4c5cc991423069facb1d4f30eaa4729b20b50e

deletions | additions      

       

Integrating databases using specimens is attractive, but not without its own set of problems. The biodiversity informatics community has yet to standardise identifiers for specimens, despite numerous efforts \cite{Guralnick_2015}, consequently there may be little apparent overlap between specimen identifiers in different databases \cite{Guralnick_2014}. As an example, despite the limited sharing of data between BOLD and GBIF, there are already barcoded specimens in GBIF. To illustrate, consider the DNA barcode GWORH520-09 from sample "BC ZSM Lep 10234". GBIF doesn't have this record from BOLD, but it does have the specimen BC ZSM Lep 10234 \cite[provided by the host institution ]{9915051b-04a1-4a45-8c40-6bed0885c5bd}. ]{ebd55b32-cd68-46b4-85e2-105de99fecc8}.  Furthermore, the DNA barcode from this specimen is also in GenBank, and because that record is georeferenced it has been ingested by GBIF as part of the Geographically tagged INSDC sequences dataset \cite{3c8d6ba3-d69e-468a-81e2-61689418f59e}. Hence, GBIF has duplicate records for a barcoded moth, neither provided directly by BOLD (Fig. 7). Merging and de-duplicating specimen-based records is going to be a significant challenge for global aggregators such as GBIF.