Authorea

Anthony Etuk edited section_COPO_Submission_APIs_COPO__.tex almost 9 years ago

Commit id: ab2e7f89cb50ccbd6e5f1c0629eb0183fe8817be

deletions | additions

\subsection{iRODS Submission APIs} As \cite{Marx_2013} suggests, there is no reason to move data outside a remote infrastructure (e.g. the cloud); analysis can be done right there. COPO enables such a continuum by providing a seamless integration with heterogeneous storage assets that are abstracted away from individual user physical storage. In particular, COPO integrates with iRODS - Integrated Rule-Oriented Data System (https://irods.org/), an open source data management package, to offload the burden of "big data" management from the system thus, enhancing its service-brokering objectives. iRODS enables data virtualisation and allows access to distributed storage assets under a unifying namespace. iRODS enables data discovery using a metadata catalog that describes files, directories, and storage resources in the data grid \cite{rajasekar2010irods}. In the context of COPO, iRODS will give plant scientists the ability to create virtual data archives, where their data are seamlessly preserved and curated with policy-based rules. Data files uploaded by an end-user to COPO are automatically routed to a connected an attached iRODS instance, thus removing the need to physically store those objects in COPO. Also, data downloaded from remote repositories (e.g. ENA), using COPO interfaces, can be held in iRODS. This data workflow can then be exploited, for instance, to perform analyses on platforms such as Galaxy or iPlant, without actually involving the end-user's computing or storage resources. A key part of this integration (with iRODS) is achieved through PyRods (http://code.google.com/p/irodspython), an open source Python client API for accessing an iRODS server. The PyRods "microservice" enables the management of data objects in iRODS including registering, metadata attribution, and retrieval of data objects.