PODS, a convention for paleoclimate data in spreadsheetsThe LiPD container based on the Linked Data JSON (JSON-LD) format is a practical solution to the problem of organizing and storing hierarchical paleoclimate data in a generalizable schema. This is an important step forward towards standardizing the representation and linkage of diverse paleoclimate datasets.In this IPython notebook, I introduce a method to interact with the LiPD container using ordinary spreadsheets. The motivation to create this method is guided by the fact that the paleoclimate community uses mainly spreadsheets to edit and store the data and the metadata of their measurements, and not JSON-based formats. What is missing is a way to convert such spreadsheet-based data to LiPD format and vice versa.Working directly with LiPD has two other disadvantages:JSON notation will never be as easy to edit and modify than a spreadsheet document;The LiPD refer for data to a headerless CSV file which requires the user to continually refer to the LiPD file and navigate its nested attributes in order to figure out what the columns in the CSV file refer to.Therefore, I propose to stick with the use of spreadsheets but standardize them into a convention where the data and the metadata are stored in two separate worksheets of the same spreadsheet document.The metadata is stored with a dot notation to represent the hierarchical nature of the attributes. This convention could be called PODS (Paleo Open Document Spreadsheet).With the PODS convention, users can directly edit their data in an ordinary spreadsheet program like Excel or OpenOffice and later convert them to LiPD, which is a good container for storing data in a document database like mongoDB (since it uses JSON).The following procedure describes how to organize a spreadsheet at the PODS convention and transform it into LiPD, and vice versa.The PODS convention2 worksheets: Data and Metadata.The Data worksheet presents all parameters as columns with a column name as in a CSV file.The Metadata worksheet has 2 columns corresponding to Attribute and Value of all the parameters described in the Data worksheet. There are no headers. Each line describes an attribute with its hierarchy with a dot notation and its value. If there is no corresponding parameter in the Data worksheet, then it is assumed that the attribute describes a global attribute.missing data are described as empty cells.no comments are allowed in the Data and Metadata worksheets but other worksheets can be added to do so.The Data worksheet:The Metadata worksheet:Notes:Today, pandas does not allow yet the reading and the writing of ODS (Open Document Spreadsheet) but there are a lot of requests for this feature and it should be feasible soon (pandas/issue 2311).The LiPD container refers to a CSV file to store the data. This headerless CSV file where each column refers to a column number is very poorly described and could lead to confusion. I think it would be safer and clearer to use one line of header to name the columns by the parameter names.The LiPD container can contain a list object for values. It would be simpler to disable this possibility and have only unique values.Make sure spreadsheet cells are formatted correctly, i.e. numbers cells are specified as number and not text. Same for dates.A compliance checker needs to be built to check that input files conform to the PODS convention, something like the netCDF CF-checker.Reading a spreadsheet at the PODS convention to pandas dataframeIn [1]:file = 'spreadsheet_at_PODS_convention_01.xls'In [2]:import pandas as pd dfD = pd.read_excel(file, sheetname='Data') dfDOut[2]:depthyeartemperature00201514.341100201512.702200201510.303400201510.2044502015NaN5500201510.1061000201510.10In [3]:dfM = pd.read_excel(file, sheetname='Metadata', header=None, names=['Attribute', 'Value']) dfMOut[3]:AttributeValue0depth.parameterTypemeasured1depth.unitscm2depth.notesdepth refers to top of sample3depth.datatypecsvw:NumericFormat4depth.descriptiondepth below ocean floor5year.parameterTypeinferred6year.unitsAD7year.methodlinear interpolation8year.datatypecsvw:NumericFormat9year.descriptioncalendar year AD10temperature.parameterTypeinferred11temperature.descriptionsea-surface temperature inferred from Mg/Ca ra...12temperature.climateInterpretation.seasonalityMJJ13temperature.climateInterpretation.parameterT14temperature.climateInterpretation.parameterDetailseaSurface15temperature.climateInterpretation.interpDirectionpositive16temperature.climateInterpretation.basisMg/Ca calibration to SST17temperature.calibration.equationBAR2005: Mg/Ca=0.794*exp(0.10*SST)18temperature.calibration.uncertainty1.319temperature.calibration.referenceBarker et al., (2005), Thornalley et al., (2009)20temperature.datatypecsvw:NumericFormat21temperature.materialforamifera carbonate22temperature.proxyMg/Ca23temperature.unitsdeg C24paleoDataTableNamedata25filenameatlantic0220Thornalley2009.csvNote that columns key has been changed and column numbers changed to parameter names.Convert the pandas dataframes to a LiPD containerIn [4]:import PODS reload(PODS) a_LiPD = PODS.df_to_LiPD(dfM, dfD.columns, verbose=True) import json print json.dumps(a_LiPD, sort_keys=False, indent=4, separators=(',', ': ')){ "depth": { "parameterType": "measured", "units": "cm", "notes": "depth refers to top of sample", "datatype": "csvw:NumericFormat", "description": "depth below ocean floor" }, "year": { "parameterType": "inferred", "units": "AD", "datatype": "csvw:NumericFormat", "method": "linear interpolation", "description": "calendar year AD" }, "temperature": { "parameterType": "inferred", "description": "sea-surface temperature inferred from Mg/Ca ratios", "climateInterpretation": { "basis": "Mg/Ca calibration to SST", "parameter": "T", "parameterDetail": "seaSurface", "interpDirection": "positive", "seasonality": "MJJ" }, "calibration": { "equation": "BAR2005: Mg/Ca=0.794*exp(0.10*SST)", "uncertainty": 1.3, "reference": "Barker et al., (2005), Thornalley et al., (2009)" }, "datatype": "csvw:NumericFormat", "material": "foramifera carbonate", "proxy": "Mg/Ca", "units": "deg C" }, "paleoDataTableName": "data", "filename": "atlantic0220Thornalley2009.csv" } Convert a LiPD container to a pandas dataframeIn [5]:dfD, dfM = PODS.LiPD_to_df(a_LiPD)In [6]:dfDOut[6]:depthyeartemperature00201514.341100201512.702200201510.303400201510.2044502015NaN5500201510.1061000201510.10In [7]:dfMOut[7]:AttributeValue0depth.parameterTypemeasured1depth.unitscm2depth.notesdepth refers to top of sample3depth.datatypecsvw:NumericFormat4depth.descriptiondepth below ocean floor5year.parameterTypeinferred6year.unitsAD7year.datatypecsvw:NumericFormat8year.methodlinear interpolation9year.descriptioncalendar year AD10temperature.parameterTypeinferred11temperature.descriptionsea-surface temperature inferred from Mg/Ca ra...12temperature.climateInterpretation.basisMg/Ca calibration to SST13temperature.climateInterpretation.parameterT14temperature.climateInterpretation.parameterDetailseaSurface15temperature.climateInterpretation.interpDirectionpositive16temperature.climateInterpretation.seasonalityMJJ17temperature.calibration.equationBAR2005: Mg/Ca=0.794*exp(0.10*SST)18temperature.calibration.uncertainty1.319temperature.calibration.referenceBarker et al., (2005), Thornalley et al., (2009)20temperature.datatypecsvw:NumericFormat21temperature.materialforamifera carbonate22temperature.proxyMg/Ca23temperature.unitsdeg C24paleoDataTableNamedata25filenameatlantic0220Thornalley2009.csvExport pandas dataframes to a spreadsheetIn [8]:writer = pd.ExcelWriter('spreadsheet_at_PODS_convention_02.xls') dfD.to_excel(writer, sheet_name="Data", index=False) # one day will be to_ods dfM.to_excel(writer, sheet_name="Metadata", index=False, header=False) writer.save()spreadsheet_at_PODS_convention_01.xls is identical to spreadsheet_at_PODS_convention_02.xls proof that convertions are correct.PODS ===> LiPD ===> PODSIn [ ]:Below are some tests to deal with values entered as listsA conversion test from a BibJSON that describes publication metadataIn [9]:a_LiPD = [{ "author": [ {"name" : "Thornalley, D.J.R"}, {"name" : "Elderfield, H."}, {"name" : "McCave, N"} ], "type" : "article", "identifier" : {"type": "doi", "id": "10.1038/nature07717", "url": "http://dx.doi.org/10.1038/nature07717"} , "pubYear": 2009 }] dfD, dfM = PODS.LiPD_to_df(a_LiPD) dfMOut[9]:AttributeValue0pubYear20091identifier.urlhttp://dx.doi.org/10.1038/nature077172identifier.typedoi3identifier.id10.1038/nature077174typearticle5author.0.nameThornalley, D.J.R6author.1.nameElderfield, H.7author.2.nameMcCave, NA conversion test with a GeoJSONIn [10]:a_LiPD = { "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": {"type": "Point", "coordinates": [102.0, 0.5]}, "properties": {"prop0": "value0"} }, { "type": "Feature", "geometry": { "type": "LineString", "coordinates": [ [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0] ] }, "properties": { "prop0": "value0", "prop1": 0.0 } }, { "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ] ] }, "properties": { "prop0": "value0", "prop1": {"this": "that"} } } ] } dfD, dfM = PODS.LiPD_to_df(a_LiPD) dfMOut[10]:AttributeValue0typeFeatureCollection1features.0.geometry.typePoint2features.0.geometry.coordinates[102.0, 0.5]3features.0.typeFeature4features.0.properties.prop0value05features.1.geometry.typeLineString6features.1.geometry.coordinates[[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [10...7features.1.typeFeature8features.1.properties.prop0value09features.1.properties.prop1010features.2.geometry.typePolygon11features.2.geometry.coordinates[[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [1...12features.2.typeFeature13features.2.properties.prop0value014features.2.properties.prop1.thisthatThose last 2 convertions are not bijective since LiPD container contains lists that are transformed with numbers for each items when PODS.LiPD_to_df is used:author[] to author.1, author.2, author.3features[] to features.1, features.2Should list be disallowed from LiPD ? Otherwise need more work to recreate a list from numbers when use PODS.df_to_LiPDauthor.1, author.2, author.3 to author[]features.1, features.2 to features[]

Jace Harker

and 2 more

INTRODUCTION LaTeX is typesetting software that is widely used by mathematicians and physicists because it is so good at typesetting equations. It is also completely programmable, so it can be configured to produce documents with almost any desired formatting, and to automatically number equations, figures, endnotes, and so on. To prepare manuscripts for the American Journal of Physics (AJP), you should use the REVTeX 4.1 format for Physical Review B preprints, as indicated in the documentclass line at the top of this article’s source file. (If you’re already familiar with LaTeX and have used other LaTeX formats, please resist the temptation to use them, or to otherwise override REVTeX’s formatting conventions, in manuscripts that you prepare for AJP.) This sample article is intended as a tutorial, template, and reference for AJP authors, illustrating most of the LaTeX and REVTeX features that authors will need. For a more comprehensive introduction to LaTeX, numerous books and online references are available. Documentation for the REVTeX package can be found on the APS web site. LaTeX is free software, available for Unix/Linux, Mac OS X, and Windows operating systems. For downloading and installation instructions, follow the links from the LaTeX web site. It is most convenient to install a “complete TeX distribution,” which will include LaTeX, the underlying TeX engine, macro packages such as REVTeX, a large collection of fonts, and GUI tools for editing and viewing your documents. To test your installation, try to process this sample article.