loading page

Sciunit: A Reproducible Container for EarthCube Community
  • +3
  • Raza Ahmad,
  • Madeline Deeds,
  • Tanu Malik,
  • Young-Don Choi,
  • Jonathan Goodall,
  • David Tarboton
Raza Ahmad
DePaul University

Corresponding Author:[email protected]

Author Profile
Madeline Deeds
DePaul University
Author Profile
Tanu Malik
DePaul University
Author Profile
Young-Don Choi
University of Virginia
Author Profile
Jonathan Goodall
University of Virginia
Author Profile
David Tarboton
Utah State University
Author Profile

Abstract

The conduct of reproducible science improves when computations are portable and verifiable. A container provides an isolated environment for running computations and thus is useful for porting applications on new machines. Current container engines, such as Linux Containers (LXC) and Docker, however, have a high learning curve, are resource-intensive, and do not address the entire reproducibility spectrum consisting of portability, repeatability, and replicability. As part of EarthCube, we have developed Sciunit (https://sciunit.run) which encapsulates application dependencies i.e, system binaries, code, data, environment, along with application provenance. The resulting research object can be easily shared and reused amongst collaborators. Sciunit can be used with HydroShare’s JupyterHub CUAHSI notebook environment, and available to the entire community for use. In this poster, we will present three new features in Sciunit which have emerged based on community-provided use cases and discussion. Sciunit is available as a command-line utility. We will: (1) showcase the new Sciunit API. This will allow data facilities to integrate Sciunit as a reproducible environment on portals, (2) show how a Sciunit container can transition to a Docker container and vice versa, and finally, (3) demonstrate the ability to contrast two containers in terms of content and metadata. We will show these capabilities with the Hydrology use case of pySUMMA, a Python API for the Structure for Unifying Multiple Modeling Alternative (SUMMA) hydrologic model.