Toward open science and community data standards for bee trait
data
Meta-analyses have the potential to clarify patterns in bee functional
ecology across biological scales (Bartomeus et al., 2018; Coutinho et
al., 2018; Garibaldi et al., 2015; Poulsen and Rasmussen, 2020; Woodcock
et al., 2019). However, while bee trait data is prolific in the
literature, we currently lack community data standards for sharing trait
data that would enable such meta-analyses. Trait databases are
increasingly emerging as tools for functional exploration within a
taxonomic group, with valuable examples from Lepidopteran (Shirey et
al., 2022), spider (Pekár et al., 2021), amphibian (Oliveira et al.,
2017), plant (Kattge et al., 2011), and bird databases (Tobias et al.,
2022); (with many other examples registered in the Open Traits Network;
Gallagher et al., 2020). Progress toward aggregated bee trait data will
depend on researchers adhering to principles of FAIR (Findable,
Accessible, Interoperable, Reusable; Wilkinson et al., 2016) data. Just
over half (64.9%) of studies in our review made their trait data
available online.
Equally important for making trait data usable in future analyses is
clearly describing trait measurement methods, defining trait terms, and
providing comprehensive metadata. Where appropriate, researchers should
consider adhering to prevailing measurement protocols (Moretti et al.,
2017). For example, measuring body size as ITD can help ensure
compatibility of data with past and future studies, due to the ubiquity
of this measurement method. Importantly, even when using standardized
methodologies, methods should still be defined and/or cited to enable
future use of data. Our analysis also revealed the diversity of adopted
trait terminology for categorical traits such as nesting biology,
sociality, and diet breadth. These terms were rarely defined, presenting
obstacles to data harmonization. In the absence of a controlled
vocabulary for bee trait classifiers, terminology should be defined,
whether by written definitions or citations of existing definitions,
including links to ontologies (e.g., the Hymenoptera Anatomy Ontology;
Yoder et al., 2010). Importantly, trait data should be shared as raw
data to facilitate use in future analysis. Many datasets in our analysis
aggregated trait data at the species level, such that information on
within-species variation was lost. Associated geographic and taxonomic
data should likewise adhere to community data standards (e.g. Darwin
Core). In our review, we found that taxonomic information was at times
incomplete, inaccurate, or ambiguous, and geographic data was poorly
linked to specimen-level trait data and/or formatted according to
outdated standards (Degrees, Minutes, Seconds format) To resolve
ambiguity and promote machine-readability across datasets, taxonomic
information should be linked to taxonomic identifiers (e.g., GBIF
Backbone Taxonomy; 2023) and sampling coordinates should be reported in
decimal-degree format.
We have compiled and harmonized the primary morphological data presented
in the studies we reviewed, where data were available and interpretable,
available at https://zenodo.org/doi/10.5281/zenodo.10139286 as
Supplementary Table 4 and registered in the Open Traits Network. This
dataset presents body size, tongue size, and pilosity data for 1209 bee
species along with geographic and other metadata. Behavioral trait data
(e.g., nesting biology, sociality, diet breadth) in the studies we
analyzed were generally extracted from the literature (secondary data),
and so do not feature in this primary dataset. Data classes are defined
in the metadata, and where possible, are mapped to the Darwin Core
(Wieczorek et al., 2012). We have mapped taxon names to taxonomic
identifiers and have introduced new trait definitions to the Hymenoptera
Anatomy Ontology (Yoder et al., 2010) to link functional trait data to
unique, persistent identifiers (e.g., tongue length:
http://purl.obolibrary.org/obo/HAO_0002606). Importantly, this dataset
provides a template (Supplementary Table 5) for organization of bee
trait data that ensures compatibility among datasets by facilitating
semantic interoperability and resolving ambiguity in terminology.