PBrockmann edited untitled.md  over 8 years ago

Commit id: 541b9f8f8aa09212e367cd681f5491200ee06f0b

deletions | additions      

       

Date: 13 November 2015
  Release: 0.6  ## Proposal #Proposal  for a structured spreadsheet for Paleo Data to interface with the Linked Paleo Data container The Linked Paleo Data (LiPD) container based on the [Linked Data JSON (JSON-LD) format](https://www.authorea.com/users/17200/articles/19163/_show_article) is a practical solution to the problem of organizing and storing hierarchical paleoclimate data in a generalizable schema. This is an important step forward towards standardizing the representation and linkage of diverse paleoclimate datasets.  In this IPython notebook, I have experimental converters to interact with the LiPD container using ordinary spreadsheets. The motivation to create this method is guided by the fact that the paleoclimate community uses mainly spreadsheets to edit and store the data and the metadata of their measurements, and not JSON-based formats. What is missing is a way to convert such spreadsheet-based data to LiPD format and vice versa.  Working directly with LiPD has two other disadvantages:  1. JSON notation will never be as easy to edit and modify than a spreadsheet document;  1. The LiPD container refers to a headerless CSV data file which requires the user to continually navigate the nested attributes of the LiPD file in order to figure out which parameters correspond to which columns in the CSV file. 

With a PDS, users can directly edit their data in an ordinary spreadsheet program like Excel or OpenOffice and later convert them to LiPD, which is a good container for storing data in a document database like mongoDB (since it uses JSON).  In addition, I have implemented converters to transform PDS to [python pandas dataframes](http://pandas.pydata.org/pandas-docs/version/0.17.0/dsintro.html#dataframe), which are convenient for subsequent data analysis in e.g. an IPython notebook.  ### The ##The  Paleo Data Spreadsheet (PDS) structure - 2 worksheets: Data and Metadata.  - The Data worksheet presents the data in a matrix with named columns for each parameter and rows for each sample, as in a CSV file.  - The Metadata worksheet has 2 columns corresponding to the Attribute and Value of all the parameters described in the Data worksheet. There are no headers. Hierarchical attributes are denoted by the dot notation (e.g. parameter.attribute) in a Attribute column and their values are contained in a Value column. If there is no corresponding parameter in the Data worksheet, then it is assumed that the attribute is global (e.g. filename).  

1. A RESTful web service could be implemented to visualize a PDS file as an HTML page with interactive plots. [Bokeh](http://bokeh.pydata.org/en/latest/), a python interactive visualization library, would be useful for this purpose.   ### Converting ##Converting  from one structure to another with PDSlib PDSlib is a python module that offers two-way converters for different structures: PDS, pandas dataframes (df), and LiPD container.