Software practices .... for ecological forecasting

Interoperability, Modularity, Reuse

What is or can be done for improving reproducibility and interoperability in specific parts of the workflow

  • Why one workflow system over another; See comparison paper (Yu 2005)
    • what are common and uncommon requirements for workflow engines
    • Data input output
    • Model component interfaces
    • What do we want in a wf engine;

Software Interoperability / Modularity

  • Favor interoperability over re-writing / translating (except where this is useful and not redundant)
  • Model coupling --> currently a Hurculean task

Data Interoperability / Modularity

(Borer 2009) provide guidelines <!--An example of a similar "guidelines" paper; outreach rather than findings: -->

(Wolkovich 2012) An example of using global change to illustrate changes needed in data management:

  • Data integration and model coupling
    • almost all existing systems don't deal in semantics
    • Try to add semantics to modelling frameworks to make it easier to build systems that understand more of the data / information compatabilities specific to the domain.

To this end, we can take the opportunity to state some widely used conventions

Ontologies / Semantics

  • Data integration and model coupling
    • almost all existing systems don't deal in semantics
    • Try to add semantics to modelling frameworks to make it easier to build systems that understand more of the data / information compatabilities specific to the domain.

Can we describe model components semantically to provide plug-n-play within wf systems? Berkeley et al 2005, Madin et al 2007, Madin et al, 2008


Workflow Software

from simple (R) to complex (ARIES, (Villa 2014))

  • R ... start here because it has broad familiarity
    • Automated provenance tracking w/in R (e.g. analytic web)
    • R as a workflow itself - this is essentially folks using knitr with markdown or latex - which has very little structure other than separating text from code blocks
    • Kepler modules as frontend to R code
  • Summary of workflow software from meeting ... what would a 'feature' matrix look like?

Data Conventions

  • MsTMIP -> widely used format by terrestrial ecosystem modelers
  • DataOne: EML describes data as it is, rather than requiring a standard example
  • NEON: has opportunity to provide data "templates" for common types of data collected by ecologists
    • under development these, not clear that they should set 'the' standard
    • a good place for community feedback on this
    • tradeoff between generality and specificity
  • Aaron Ellison: the first three columns should always be x,y,t (z if relevant); on one hand, of course this isn't normalized; on the other hand, this is a straightforward rule.
    • Coding conventions
    • Code use/ reuse


  • rnoaa -> MsTMIP format
  • taxize -> USDA plants [Genus Species ScientificName CommonName) and BETYdb ([genus species scientificname commonname)

    • rOpenSci Getting data into R: what do workflows need?
  • How can we standardize data formats?

  • EML (ecological metadata language), NeXML (

  • estimates of uncertainty / data quality

  • uniform API for multiple sources of data - we're working on this, so far we have

    • taxonomy (from e.g., NCBI, ITIS, etc.)
    • spatial data (from e.g. GBIF, BISON, iNaturalist, etc.)
  • EML package: can automatically push data and get do

  • Would it help to write out netcdf files from our NOAA R pkg wrapper to use in other software? And for spatial occurrence data from spocc/rgbif/etc?

  • Data transformations we could provide? e.g, Interpolation of climate data from NOAA

Model Coupling

  • what we currently do: CLM + DayCent; BioCro + DayCent
  • A better approach

High Performance Computing, Big Data,

  • Parallelization of code; access to different scheduler engines; evolving landscape
  • Workflow branches based on needs of node (e.g. I/O, flops, ram)
    • helps w/ scalability.
  • Uncertainty, multi-scale coupling
    • how to couple independent codes; model-to-model coupling; transformation
  • feel free to add to this list. Should there be any additional columns / feature matrix?
    • Lead / contact infor
    • Application Domain
    • OS support
    • Scope of workflow (this should probably be defined): Which categories are relevant? Data Management, Data Processing, Simulation models, Visualization

Software to support Scientific Workflows

Conference Attendees

Software Participant References Description
Analytic Web Aaron Ellison
ARIES Ferdinando Villa A modeling platform.
EcoPAD Yiqi Luo An ecological platform for data assimilation and forecasting in ecology.
EcoPath and EcoSim Jeroen Steenbeek Ecological/ecosystem modeling software suite.
Kepler Matt Jones, Ilkay Altintas Dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler.  Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines
Microsoft Computational Science Tools Matthew Smith Includes:1) Fetchclimate for environmental data access from the cloud via code and browser, 2) Filzbach for model parameter inference, 3) Scientific Dataset tools for facilitating data access, visualization, transfer, and creation 4) Swavesey for creating and hosting cloud based databases.
PEcAn David LeBauer, Mike Dietze Two bioinformatics tools: 1) a scientific workflow and 2) a Bayesian data assimilation system.
Swift, FACE-IT / Galaxy-ES David Kelly

Other Workflow Software Projects

Software References Description
BioMA Designed and developed for analyzing, parameterizing and running modelling solutions
Environmental Virtual Observatory A proof of concept project that has been created to demonstrate that linking data, models and expert knowledge will provide cost effective answers to vital wide-ranging environmental issues, initially in the soil-water system.
ESMF High-performance, flexible software infrastructure to increase ease of use, performance portability, interoperability, and reuse in climate, numerical weather prediction, data assimilation, and other Earth science applications.
GENIE A grid enabled framework that facilitates the integration, execution and management of component models for the study of the Earth system over millennial timescales.
OMS Allows model construction and model application based on components..
Science Pipes Allows anyone to access, analyze, and visualize the huge volume of primary biodiversity data currently available online.
Triana Combines an intuitive visual interface with powerful data analysis tools.
VisTrails An open-source scientific workflow and provenance management system that provides support for simulations, data exploration and visualization.
VOEIS A framework for data acquisition, analysis, model integration, and display of data products from completed workflows including geospatially explicit models, graphs from statistical analyses, and GIS displays of classified ecological attributes on the landscape.