ROUGH DRAFT authorea.com/6752
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • Software practices .... for ecological forecasting

    Interoperability, Modularity, Reuse

    What is or can be done for improving reproducibility and interoperability in specific parts of the workflow

    • Why one workflow system over another; See comparison paper (Yu 2005)
      • what are common and uncommon requirements for workflow engines
      • Data input output
      • Model component interfaces
      • What do we want in a wf engine;

    Software Interoperability / Modularity

    • Favor interoperability over re-writing / translating (except where this is useful and not redundant)
    • Model coupling --> currently a Hurculean task

    Data Interoperability / Modularity

    (Borer 2009) provide guidelines <!--An example of a similar "guidelines" paper; outreach rather than findings: -->

    (Wolkovich 2012) An example of using global change to illustrate changes needed in data management:

    • Data integration and model coupling
      • almost all existing systems don't deal in semantics
      • Try to add semantics to modelling frameworks to make it easier to build systems that understand more of the data / information compatabilities specific to the domain.

    To this end, we can take the opportunity to state some widely used conventions

    Ontologies / Semantics

    • Data integration and model coupling
      • almost all existing systems don't deal in semantics
      • Try to add semantics to modelling frameworks to make it easier to build systems that understand more of the data / information compatabilities specific to the domain.

    Can we describe model components semantically to provide plug-n-play within wf systems? Berkeley et al 2005, Madin et al 2007, Madin et al, 2008

    Examples

    Workflow Software

    from simple (R) to complex (ARIES, (Villa 2014))

    • R ... start here because it has broad familiarity
      • Automated provenance tracking w/in R (e.g. analytic web)
      • R as a workflow itself - this is essentially folks using knitr with markdown or latex - which has very little structure other than separating text from code blocks
      • Kepler modules as frontend to R code
    • Summary of workflow software from meeting ... what would a 'feature' matrix look like?

    Data Conventions

    • MsTMIP -> widely used format by terrestrial ecosystem modelers
    • DataOne: EML describes data as it is, rather than requiring a standard example
    • NEON: has opportunity to provide data "templates" for common types of data collected by ecologists
      • under development these, not clear that they should set 'the' standard
      • a good place for community feedback on this
      • tradeoff between generality and specificity
    • Aaron Ellison: the first three columns should always be x,y,t (z if relevant); on one hand, of course this isn't normalized; on the other hand, this is a straightforward rule.
      • Coding conventions
      • Code use/ reuse

    rOpenSci

    • rnoaa -> MsTMIP format
    • taxize -> USDA plants [Genus Species ScientificName CommonName) and BETYdb ([genus species scientificname commonname)

      • rOpenSci Getting data into R: what do workflows need?
    • How can we standardize data formats?

    • EML (ecological metadata language), NeXML (http://www.nexml.org/)

    • estimates of uncertainty / data quality

    • uniform API for multiple sources of data - we're working on this, so far we have

      • taxonomy (from e.g., NCBI, ITIS, etc.)
      • spatial data (from e.g. GBIF, BISON, iNaturalist, etc.)
    • EML package: can automatically push data and get do

    • Would it help to write out netcdf files from our NOAA R pkg wrapper to use in other software? And for spatial occurrence data from spocc/rgbif/etc?

    • Data transformations we could provide? e.g, Interpolation of climate data from NOAA

    Model Coupling

    • what we currently do: CLM + DayCent; BioCro + DayCent
    • A better approach

    High Performance Computing, Big Data,

    • Parallelization of code; access to different scheduler engines; evolving landscape
    • Workflow branches based on needs of node (e.g. I/O, flops, ram)
      • helps w/ scalability.
    • Uncertainty, multi-scale coupling
      • how to couple independent codes; model-to-model coupling; transformation
    • feel free to add to this list. Should there be any additional columns / feature matrix?
      • Lead / contact infor
      • Application Domain
      • OS support
      • Scope of workflow (this should probably be defined): Which categories are relevant? Data Management, Data Processing, Simulation models, Visualization

    Software to support Scientific Workflows

    Conference Attendees

    Software Participant References Description
    Analytic Web Aaron Ellison
    ARIES Ferdinando Villa A modeling platform.
    EcoPAD Yiqi Luo An ecological platform for data assimilation and forecasting in ecology.
    EcoPath and EcoSim Jeroen Steenbeek Ecological/ecosystem modeling software suite.
    Kepler Matt Jones, Ilkay Altintas Dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler.  Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines
    Microsoft Computational Science Tools Matthew Smith Includes:1) Fetchclimate for environmental data access from the cloud via code and browser, 2) Filzbach for model parameter inference, 3) Scientific Dataset tools for facilitating data access, visualization, transfer, and creation 4) Swavesey for creating and hosting cloud based databases.
    PEcAn David LeBauer, Mike Dietze Two bioinformatics tools: 1) a scientific workflow and 2) a Bayesian data assimilation system.
    rOpenSci