Why coarse granularity?

RRIDs are meant to identify research resources at a fairly high level of granularity. At some of the planning meetings, there was a push for more granular information, like lot and batch numbers, for antibodies. We recognize that this level of granularity is likely an important factor in determining how a given reagent performs \cite{24255050}. However, in the analysis by Vasilevsky et al. (2013) and in our experience in resource identification using text-mining, the biggest problem was not that authors were not supplying lot numbers but that they were not even supplying catalog numbers. Given that the catalog numbers themselves do not serve as stable identifiers, because antibodies are bought and sold and redistributed by many vendors, we elected to tackle the problem of identifying the root antibody first, i.e., a particular clone for a monoclonal antibody or a type of polyclonal antibody produced by particular protocol. To illustrate the problem, consider the study by Slotta \cite{24255050} that provided an analysis of the performance of antibodies to NkB p65, as a follow up to a similar study by Herkenham \cite{21999414}. Both studies performed specificity tests on a variety of antibodies and, as is common, did not produce concordant results on all of them. Slotta had been the original producer of an antibody now commonly known as MAB3026 (AB_2178887) and provided its provenance: “It was transferred to Boehringer-Mannheim as Clone 12H11, resold to Roche and finally bought by Chemicon, and it is now sold as MAB3026.” They then speculate that a mutation may have crept in at some point that altered the specificity of the antibody. However, it may simply be that as the antibody was tested under additional conditions, problems were revealed that had not been apparent during more limited applications. The RRID for this antibody binds these different representations together so that all references to this antibody can be tracked. However, authors are encouraged in the citation format to include details about the particular instance of this antibody, namely, the vendor from which the antibody was purchased and the catalog, batch, and lot numbers. However, we did not want to overload the ID system to require assignment of these different lot numbers different RRIDs and maintain the mappings. We also felt that this would grossly decrease compliance.

Similarly, for software and databases, we elected to identify just the root entity and not a granular citation of a particular software version or database. Our main goal in the case of software tools and databases was to track broad patterns of utilization of these resources (e.g., how many times NeuroMorpho.org was used and not particular versions). More complete practices for citing software and data sets are emerging from recent efforts like the Joint Declaration of Data Citation principles (https://www.force11.org/datacitation), the W3C HCLS dataset description (http://tiny.cc/hcls-datadesc), the software discovery index (http://softwarediscoveryindex.org/), and many others. In these cases, groups are exploring more complete reporting standards for the individual instances (versions, workflows, virtual machines) that can be used to replicate the findings. It should be noted that the goal of using RRIDs for software tools was to determine participation rates for authors identifying these resources using the easiest possible solution, with the longer term goal including more robust versioning and archival software practices that would support reproducibility.