Toward open science and community data standards for bee trait data
Meta-analyses have the potential to clarify patterns in bee functional ecology across biological scales (Bartomeus et al., 2018; Coutinho et al., 2018; Garibaldi et al., 2015; Poulsen and Rasmussen, 2020; Woodcock et al., 2019). However, while bee trait data is prolific in the literature, we currently lack community data standards for sharing trait data that would enable such meta-analyses. Trait databases are increasingly emerging as tools for functional exploration within a taxonomic group, with valuable examples from Lepidopteran (Shirey et al., 2022), spider (Pekár et al., 2021), amphibian (Oliveira et al., 2017), plant (Kattge et al., 2011), and bird databases (Tobias et al., 2022); (with many other examples registered in the Open Traits Network; Gallagher et al., 2020). Progress toward aggregated bee trait data will depend on researchers adhering to principles of FAIR (Findable, Accessible, Interoperable, Reusable; Wilkinson et al., 2016) data. Just over half (64.9%) of studies in our review made their trait data available online.
Equally important for making trait data usable in future analyses is clearly describing trait measurement methods, defining trait terms, and providing comprehensive metadata. Where appropriate, researchers should consider adhering to prevailing measurement protocols (Moretti et al., 2017). For example, measuring body size as ITD can help ensure compatibility of data with past and future studies, due to the ubiquity of this measurement method. Importantly, even when using standardized methodologies, methods should still be defined and/or cited to enable future use of data. Our analysis also revealed the diversity of adopted trait terminology for categorical traits such as nesting biology, sociality, and diet breadth. These terms were rarely defined, presenting obstacles to data harmonization. In the absence of a controlled vocabulary for bee trait classifiers, terminology should be defined, whether by written definitions or citations of existing definitions, including links to ontologies (e.g., the Hymenoptera Anatomy Ontology; Yoder et al., 2010). Importantly, trait data should be shared as raw data to facilitate use in future analysis. Many datasets in our analysis aggregated trait data at the species level, such that information on within-species variation was lost. Associated geographic and taxonomic data should likewise adhere to community data standards (e.g. Darwin Core). In our review, we found that taxonomic information was at times incomplete, inaccurate, or ambiguous, and geographic data was poorly linked to specimen-level trait data and/or formatted according to outdated standards (Degrees, Minutes, Seconds format) To resolve ambiguity and promote machine-readability across datasets, taxonomic information should be linked to taxonomic identifiers (e.g., GBIF Backbone Taxonomy; 2023) and sampling coordinates should be reported in decimal-degree format.
We have compiled and harmonized the primary morphological data presented in the studies we reviewed, where data were available and interpretable, available at https://zenodo.org/doi/10.5281/zenodo.10139286 as Supplementary Table 4 and registered in the Open Traits Network. This dataset presents body size, tongue size, and pilosity data for 1209 bee species along with geographic and other metadata. Behavioral trait data (e.g., nesting biology, sociality, diet breadth) in the studies we analyzed were generally extracted from the literature (secondary data), and so do not feature in this primary dataset. Data classes are defined in the metadata, and where possible, are mapped to the Darwin Core (Wieczorek et al., 2012). We have mapped taxon names to taxonomic identifiers and have introduced new trait definitions to the Hymenoptera Anatomy Ontology (Yoder et al., 2010) to link functional trait data to unique, persistent identifiers (e.g., tongue length: http://purl.obolibrary.org/obo/HAO_0002606). Importantly, this dataset provides a template (Supplementary Table 5) for organization of bee trait data that ensures compatibility among datasets by facilitating semantic interoperability and resolving ambiguity in terminology.