Anthony Etuk edited section_COPO_Submission_APIs_COPO__.tex  almost 9 years ago

Commit id: 12fbfa8f601a2fca69f2663a7332ff0d672d8aff

deletions | additions      

       

\subsection{iRODS Submission APIs}  As \cite{Marx_2013} suggests, there is no reason to move data outside a remote infrastructure (e.g. the cloud); analysis can be done right there. COPO enables such a continuum by providing a seamless integration with heterogeneous storage assets that are abstracted away from individual user physical storage. In particular, COPO integrates with iRODS - Integrated Rule-Oriented Data System (https://irods.org/), an open source data management package, to offload the burden of "big data" management from the system thus, enhancing its service-brokering objectives. iRODS enables data virtualisation and allows access to distributed storage assets under a unifying namespace. iRODS enables data discovery using a metadata catalog that describes files, directories, and storage resources in the data grid \cite{rajasekar2010irods}. In the context of COPO, iRODS will give plant scientists the ability to create virtual data archives, where their data are seamlessly preserved and curated with policy-based rules.   Data files uploaded to COPO are automatically routed to a connected iRODS instance, thus removing the need to physically store those objects in COPO. Also, data downloaded from remote repositories (e.g. ENA) using COPO interfaces, can be held in iRODS. This data workflow can then be exploited, for instance, to perform analysis analyses  on platforms such as Galaxy(http://galaxyproject.org/)  or iPlant (http://www.iplantcollaborative.org/), iPlant,  without even involving the user's computing or storage resources. A key part of this integration (with iRODS) is achieved through PyRods (http://code.google.com/p/irodspython), an open source Python client API for accessing an iRODS server. The PyRods "microservice" enables the management of data objects in iRODS including registering, metadata attribution, and retrieval of data objects.