Bundling linked research materials

loading page

Stian Soiland-Reyes

Abstract

Stian Soiland-Reyes, Kevin Page?, Khalid Belhajjame, Jun Zhao?, David De Roure?, Carole Goble

Describing http://purl.org/wf4ever/ro-bundle and https://github.com/wf4ever/robundle

Abstract

Research Objects (RO) has been suggested as a mechanism to preserve digital research materials and their relationships and annotations. This is a key factor in improving reproducibility of modern science which significantly depends on computational analysis and processing. [ref?]. The support for the W3C Research Object for Scholarly Communication Community Group highlights the need for rolling out the concept of research object across scientific publications.

While the existing Research Object model allow the creation of research objects using Linked Data and RESTful web services, a considerable amount of scientific software do not directly deal with the web. Therefore issues such as minting URIs or publishing Linked Data become troublesome [ref?], as this kind of software typically stores methods and results as files on a local or distributed file system.

Although the Linked Data approach to building a Research Object uses URI references to relate the aggregated resources, scientists still mentally distinguish between “local” or “composed” resources, and “external” or “referenced” resources. While support for managing composed collections have been proposed in the Linked Data Platform, there are still issues relating to distribution and cloning of such collections.

In this paper we present the Research Object Bundle, a ZIP-based media format that formalizes how to create a single file that bundles both the RO descriptions and annotations, but also the files the scientists desire to distribute embedded with the research object.

The RO bundle format forms the basis for specifying application-specific bundles, and we explore how the scientific workflow system \cite{Taverna201]} has implemented bundles for distributing complex data values and complete workflow run provenance. We then examine how a different workflow system, GridSpace, can use RO bundles to distribute snapshots of workflow runs between installations.

In order to improve uptake, the RO bundle format uses a JSON-LD context to describe the manifest. This means that we don’t require the developer to understand linked data concepts or how to mint URIs, knowing JSON and a brief understanding of the Research Object model (such as aggregations, annotations and provenance), together with how to create a ZIP file, is sufficient to create an RO bundle.

RO bundles, as files, are easily distributed, for instance as email attachments, on institutional file servers or published on the web. This raises a challenge with respect to the identity of the research object and its evolution; if two people publish the same RO bundle at two different locations, are those then representing the same RO? What if one of the ZIP files is updated with an additional resource? We resolve this issue by simply declaring any RO bundle as an independent RO snapshot; which itself is unidentified (beyond how it is accessed). Within the RO bundle we relate resources using relative URI references, but also optionally include an RO evolution trace, where the Live RO that the bundle was created from can be identified.

Integrating RO bundles into the existing Research Object Linked Data cloud can be achieved simply by unzipping the bundle and processing the manifest from the linked JSON-LD context. We demonstrate how RO bundles from the examined applications have been integrated into the RO frameworks used by the myExperiment site.