Introduction

Research resources, defined here as the reagents, materials, and tools used to produce the findings of a study, are the cornerstone of biomedical research. However, as has long been bemoaned by database curators and investigated by Vasilevsky and colleagues, these resources are not very well identified in the scientific literature \cite{24032093}. This study found that researchers did not include sufficient detail for unique identification of most key research resources, such as genetically modified animals, cell lines, or antibodies. In most cases, authors would provide insufficient metadata about the resource to conclusively identify the particular resource, e.g., a non-unique set of attributes with no catalog or stock number. It should be noted that the authors were, generally speaking, following the guidelines offered by the journals. Such guidelines traditionally state that one should include the company name and city in which it was located. Further, even when uniquely identifying information was provided (e.g., a catalog number for a particular antibody), the vendor may have gone out of business, the particular product may no longer be available, or its catalog information may have changed. Given that in these cases a human cannot find which resources were used, an automated agent, such as a search engine or text mining tools will also not be able to identify the resources.

Because current practices for reporting research resources within the literature are inadequate, non-standardized, and not optimized for machine-readable access, it is currently very difficult to answer very basic questions about published studies such as “What studies used the transgenic mouse I am interested in?” These types of questions are of interest to the biomedical community, which relies on the published literature to identify appropriate reagents, troubleshoot experiments, and aggregate information about a particular organism or reagent to form hypotheses about mechanism and function. Such information is also critical to funding agencies, who funded a research group to generate a particular tool or reagent, and the resource providers, both commercial and academic, who would like to be able to track the use of these resources in the literature. Beyond this basic utility, identification of the particular research resource used is an important component of scientific reproducibility or lack thereof.

The Resource Identification Initiative (RII) is laying the foundation of a system for reporting research resources in the biomedical literature that will support unique identification of research resources used within a particular study. The initiative is jointly led by the Neuroscience Information Framework (NIF; http://neuinfo.org) and the Oregon Health & Science University (OHSU) Library, data integration efforts occurring as part of the Monarch Initiative (www.monarchinitiative.org), and with numerous community members through FORCE11, the Future of Research Communications and e-Scholarship, which is a grassroots organization dedicated to transforming scholarly communication through technology. Since 2006, NIF has worked to identify research resources of relevance to neuroscience. The OHSU group has long-standing ties to the model organism community, which maintains databases populated by curating the literature and contacting authors to add links between model organisms, reagents, and other data. In a 2011 workshop (see https://www.force11.org/node/4145) held under the auspices of the Linking Animal Models to Human Diseases (LAMHDI) consortium, various stakeholders from this community drafted recommendations for better reporting standards for animal models, genes, and key reagents.

The RII initiative was launched as a result of two planning meetings building off of the recommendations of the LAMHDI workshop. The first was held in 2012 at the Society for Neuroscience meeting with over 40 participants comprising editors, publishers and funders (sponsored by INCF; http://incf.org). This meeting outlined the problem of incomplete identification of research resources within papers, and the need for a computational solution for identifying and tracking them in the literature. Recognizing that any solution needed to work for both humans and machines, three broad requirements were identified: 1) the standard should be machine-processable, that is, designed for search algorithms, in addition to human understanding; 2) the information should be available outside the paywall, so that search algorithms and humans have free access to the information across the biomedical literature; and 3) the standard should be uniform across publishers, to make uptake and usage easier for both human and machine.

A follow-up workshop at the NIH (https://www.force11.org/node/4857) was held in June of 2013 to gain agreement from this stakeholder group for the design a pilot that would explore solutions for this problem. A working group, the Resource Identification Initiative, was established through FORCE11 comprised of publishers, journal editors, antibody manufacturers and distributors, biocurators, software tool developers, and foundations. Based upon agreements garnered at the June 2013 meeting, the RII designed a pilot project to test implementation of a system for authors submitting manuscripts to identify research resources through the use of a unique identifier, termed a Research Resource Identifier (RRID).