Differential speckle imaging
and 1 collaborator
Energy dependence of Net-kaon Multiplicity Distributions at RHIC
The Resource Identification Initiative: A cultural shift in publishing
and 14 collaborators
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or answer basic questions such as “How did other studies use resource X?”. To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (i.e. software and databases). RRIDs are assigned by an authoritative database, for example a model organism database, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.
A Review of Data Structures for Data Science
and 9 collaborators
Data structures are the foundation upon which computational tools are built. For example, the simple pointer-to-memory approach, established by languages such as Fortran and C, acts as a de facto standard by which different packages and libraries can interoperate with a single shared array of numerical data in memory. While this simple abstraction for n-dimensional arrays has served us well in the past, there is a clear need for data structures that have richer semantics and make it easy to express and manipulate common forms of (semi-)structured data. This need is highlighted by the popularity of R’s data frames and Python libraries, such as bcolz (column storage), pandas (indexed data frames), and X-ray (n-dimensional indexed arrays).
This paper aims to present the state of the art in data structures, across programming languages and implementation details, that are foundational in data science, scientific computing, and statistical applications. It will review current data representation semantics implemented by various libraries, packages, and languages, with an explicit emphasis on interoperability across languages and process boundaries.
Strong Lens Time Delay Challenge: I. Experimental Design
and 7 collaborators