ROUGH DRAFT authorea.com/9719
Main Data History
Export
Show Index Toggle 3 comments
  •  Quick Edit
  • Apache Taverna Language: Semantic and flexible workflow definitions

    Abstract

    Authors: Stian Soiland-Reyes 1,2, David Withers, Alan R Williams1,2, Donal Fellows1,2, Matthew Gamble2, Carole Goble2

    1 Apache Software Foundation; 2 University of Manchester

    This article describes the workflow language of Apache Taverna (Wolstencroft 2013), in particular focusing on its workflow language SCUFL2 and the abstract semantic workflow model wfdesc.

    The SCUFL2 API (Withers; 2014) allows construction and inspection of Taverna 3 workflows from independent applications, but also enables translation from/to different abstract and concrete third-party workflow formats (SHIWA IWIR, MG-RAST AWE) and the Common Workflow Language

    This includes the general semantic model wfdesc which we have created within the Wf4Ever project (Hettne 2014) for the purpose of workflow preservation and annotation. wfdesc is easily combined with W3C PROV-based workflow provenance, and is also used by the digital preservation project SCAPE to find and compose semantically described Workflow Components from the myExperiment workflow repository.

    Target: Int. J. on Software Tools for Technology Transfer - Special Section on Scientific Workflows

    Introduction

    • Generic What is Taverna blurb
    • something about Apache and Taverna 3?

    Background

    Overview of Taverna workflow structure

    Earlier formats: scufl and t2flow

    • scufl: Simple XML format - describing only the dataflow
      • Many implications - e.g. a datalink between two ports mean those ports exist
      • Almost no metadata
      • No structured information about services - "Insert XML here" logic
      • Easily generated by third-parties
      • Some-what executeable by other third-parties
      • .. but they usually get the workflow semantics wrong
      • Easy to edit by hand
    • t2flow - not so simple XML with everything
      • Built around T2 engine and its implementation
      • e.g. support for multiple activities, dispatch stack, richer iteration strategies
      • XMLBeans serialization of engine state
      • Execution engine not separated from design workbench
      • Stronger annotation support
      • Parsing takes a long time as it recreates the engine state - even if you just need to edit
      • Hard to consume - very noisy
      • Exposes the complete structure of the engine
      • Very hard to generate - need to have template-based copy-and-paste
      • Impossible to edit by hand

    Motivations for SCUFL2

    • Sharable
      • .. and rerunnable workflows
    • Modular
      • Workbench vs Command line vs Server vs Grid
    • Independence from engine implementation
    • Programmatic access outside Taverna
      • Reuse existing formats like ZIP and XML
    • Semantic annotations
    • Semantic inspection
    • Flexibility - should not need to say everything
    • Translations - load/save other workflow formats
    • Embedding - add resources without having to serialize them within a massive XML

    Not a requirement:

    • Editing by hand (Still need to know a lot about the services)
    • Too much implicitly - only implicit-where-appropriate (e.g. useful defaults)

    Review of workflow languages

    • Galaxy
    • Knime
    • WINGS and OPM-W
    • Airavata
    • BPEL!! ?? Uuuh..
    • ..?

    Related languages and technologies