Anthony Etuk edited sectionProgress_Coll.tex  about 9 years ago

Commit id: c5266e0f7702e6fb566394edd9b14c5bb27a71f0

deletions | additions      

       

This seamless interaction of deposition, meta data labelling, semantically enabled search and data attribution represents a novel way of gluing together existing services to greatly enhance what is currently available to plant scientists enabling them to find more relevant information more quickly, deposit their research outputs into the public domain with little effort and get credited for doing so.  \subsection{COPO Submission APIs}  COPO builds on a rich set of APIs to facilitate the submission of research objects (e.g. raw sequencing data) to disparate data stores and repositories. These APIs are managed transparently within COPO to lift the burden of data deposition or transfer away from the user of the system. Some of the issues COPO’s submission APIs attempt to address include, but are not limited to, conversions between different formats (e.g. tab-delimited formats and XML, \cite{Rocca_Serra_2010}), \cite{Rocca-Serra2010}),  “big-data” transfer issues (e.g. delay overheads, data integrity and privacy), reproducing a piece of research or performing analysis on deposited data, etc. In Figure \ref{figure1}, we highlighted on the interaction of COPO with existing data infrastructure, the interoperability of which is made possible through the use of APIs. In what follows, we provide more specific discussions on the different APIs enabled by COPO for submissions to the different repositories and data stores captured in \ref{figure1}. \subsection{ENA Submission APIs}  The European Nucleotide Archive (ENA) is an established repository for storing nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation (http://www.ebi.ac.uk/ena). As a matter of fact, the provision of nucleotide sequence data to ENA is currently regarded as a mandatory step in the dissemination of research outputs to the scientific community \cite{?}. The data submission workflow of ENA entails that data files must be uploaded before they can be submitted. To achieve this, ENA provides support for several software components to assist users in submitting data to the repository. Among them is fasp transfer technology, which is especially recommended by ENA for long distance file transfers. The fasp transfer technology, developed by Aspera (http://asperasoft.com), eliminates the fundamental shortcomings of conventional TCP-based file transfer technologies such as FTP and HTTP. Data transfer with fasp is recorded as achieving speeds that are hundreds of times faster than FTP/HTTP \cite{Marx_2013}. COPO provides an API (with a web-based interface), which builds on the Aspera fasp transfer technology, for uploading files to users’ dropbox in ENA. Using this functionality (activated with just a single click), the user can monitor the progress of the uploaded data. Also, metadata about the upload process (e.g. time of upload or process initiator) can be recorded and made available to other components of the system. The ISA API is also used within this context to enable conversions to data formats (e.g. XML) supported by ENA before a submission can be made. Once a submission is completed, as enabled by this array of APIs, an accession is obtained and maintained within COPO.