Apache Taverna Language: Semantic and flexible workflow definitions


Authors: Stian Soiland-Reyes 1,2, David Withers, Alan R Williams1,2, Donal Fellows1,2, Matthew Gamble2, Carole Goble2

1 Apache Software Foundation; 2 University of Manchester

This article describes the workflow language of Apache Taverna (Wolstencroft 2013), in particular focusing on its workflow language SCUFL2 and the abstract semantic workflow model wfdesc.

The SCUFL2 API (Withers; 2014) allows construction and inspection of Taverna 3 workflows from independent applications, but also enables translation from/to different abstract and concrete third-party workflow formats (SHIWA IWIR, MG-RAST AWE) and the Common Workflow Language

This includes the general semantic model wfdesc which we have created within the Wf4Ever project (Hettne 2014) for the purpose of workflow preservation and annotation. wfdesc is easily combined with W3C PROV-based workflow provenance, and is also used by the digital preservation project SCAPE to find and compose semantically described Workflow Components from the myExperiment workflow repository.

Target: Int. J. on Software Tools for Technology Transfer - Special Section on Scientific Workflows


  • Generic What is Taverna blurb
  • something about Apache and Taverna 3?


Overview of Taverna workflow structure

Earlier formats: scufl and t2flow

  • scufl: Simple XML format - describing only the dataflow
    • Many implications - e.g. a datalink between two ports mean those ports exist
    • Almost no metadata
    • No structured information about services - "Insert XML here" logic
    • Easily generated by third-parties
    • Some-what executeable by other third-parties
    • .. but they usually get the workflow semantics wrong
    • Easy to edit by hand
  • t2flow - not so simple XML with everything
    • Built around T2 engine and its implementation
    • e.g. support for multiple activities, dispatch stack, richer iteration strategies
    • XMLBeans serialization of engine state
    • Execution engine not separated from design workbench
    • Stronger annotation support
    • Parsing takes a long time as it recreates the engine state - even if you just need to edit
    • Hard to consume - very noisy
    • Exposes the complete structure of the engine
    • Very hard to generate - need to have template-based copy-and-paste
    • Impossible to edit by hand

Motivations for SCUFL2

  • Sharable
    • .. and rerunnable workflows
  • Modular
    • Workbench vs Command line vs Server vs Grid
  • Independence from engine implementation
  • Programmatic access outside Taverna
    • Reuse existing formats like ZIP and XML
  • Semantic annotations
  • Semantic inspection
  • Flexibility - should not need to say everything
  • Translations - load/save other workflow formats
  • Embedding - add resources without having to serialize them within a massive XML

Not a requirement:

  • Editing by hand (Still need to know a lot about the services)
  • Too much implicitly - only implicit-where-appropriate (e.g. useful defaults)

Review of workflow languages

  • Galaxy
  • Knime
  • WINGS and OPM-W
  • Airavata
  • BPEL!! ?? Uuuh..
  • ..?

Related languages and technologies


SCUFL2 Workflow Bundle

  • A ZIP file that contains several files that together define the workflow to execute
    • One file per (nested) workflow
    • Workflow Bundle document that ties together
    • "Main workflow" and "Preferred profile"
  • Profiles for choosing implementation activity and configurations
    • Separate file per profile
    • Separate file per configuration
    • Can be shared across profiles
  • JSON-LD based configurations - flexible for each activity
  • XML schemas for workflow structure files
    • structured XML that can be validated
    • ..also parseable as RDF/XML
  • Everything has a URI
    • Unique UUID-based identifier pr workflow version
    • Declared relative within ZIP - but can be made absolute
    • Annotation target for any workflow object
    • URI service - minimally describe any worfklow component


  • Java API for programmatically loading, saving and creating SCUFL2 Workflow Bundles
  • Each workflow element is a Java bean
  • Support for load/save different formats, scufl1, t2flow, wfbundle, wfdesc, shiwa-iwir


  • Abstract workflow language that only deals with the pipeline
  • Easier to query
  • Hooks for annotation on a Research Object level
  • Hooks for describing provenance of workflow run
  • Same identifiers as in SCUFL2
  • Not complete enough for execution - no control logic or iteration
  • Inspiration for Commons workflow language

Common Workflow Language

  • Work in progress
  • Definition for executable workflow language across bioinformatics workflow systems.
  • Specifies how to invoke command line tools
  • Dataflow-driven

RO Bundle

  • Evolution from UCF container to RO bundle
    • Details how to describe metadata, provenance and annotations
  • Formal specification:
    • To be proposed as an RFC
  • A wfbundle is also a ro bundle
    • how to store annotations


Workflow results can be stored as a