Jonathan L. Goodall

Integration of Reproducible Methods into Community Cyberinfrastructure

David Tarboton

and 3 more

December 26, 2020

For science to reliably support new discoveries, its results must be reproducible. This has proven to be a challenge in many fields including fields that rely on computational methods as a means for supporting new discoveries. Reproducibility in these studies is particularly difficult because they require open, documented sharing of data and models and careful control of underlying hardware and software dependencies so that computational procedures executed by the original researcher are portable and can be run on different hardware or software and produce consistent results. Despite recent advances in making scientific work more findable, accessible, interoperable and reusable (FAIR), fundamental questions in the conduct of reproducible computational studies remain: Can published results be repeated in different computing environments? If yes, how similar are they to previous results? Can we further verify and build on the results by using additional data or changing computational methods? Can these changes be automatically and systematically tracked? This presentation will describe our EarthCube project to advance computational reproducibility and make it easier and more efficient for geoscientists to preserve, share, repeat and replicate scientific computations. Our approach is based on Sciunit software developed by prior EarthCube projects which encapsulates application dependencies composed of system binaries, code, data, environment and application provenance so that the resulting computational research object can be shared and re-executed on different platforms. We have deployed Sciunit within the HydroShare JupyterHub platform operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) for the hydrology research community and will present use cases that demonstrate how to preserve, share, repeat and replicate scientific results from the field of hydrologic modeling. While illustrated in the context of hydrology, the methods and tools developed as part of this project have the potential to be extended to other geoscience domains. They also have the potential to inform the reproducibility evaluation process as currently undertaken by journals and publishers.

Sciunit: A Reproducible Container for EarthCube Community

Raza Ahmad

and 5 more

June 26, 2020

The conduct of reproducible science improves when computations are portable and verifiable. A container provides an isolated environment for running computations and thus is useful for porting applications on new machines. Current container engines, such as Linux Containers (LXC) and Docker, however, have a high learning curve, are resource-intensive, and do not address the entire reproducibility spectrum consisting of portability, repeatability, and replicability. As part of EarthCube, we have developed Sciunit (https://sciunit.run) which encapsulates application dependencies i.e, system binaries, code, data, environment, along with application provenance. The resulting research object can be easily shared and reused amongst collaborators. Sciunit can be used with HydroShare’s JupyterHub CUAHSI notebook environment, and available to the entire community for use. In this poster, we will present three new features in Sciunit which have emerged based on community-provided use cases and discussion. Sciunit is available as a command-line utility. We will: (1) showcase the new Sciunit API. This will allow data facilities to integrate Sciunit as a reproducible environment on portals, (2) show how a Sciunit container can transition to a Docker container and vice versa, and finally, (3) demonstrate the ability to contrast two containers in terms of content and metadata. We will show these capabilities with the Hydrology use case of pySUMMA, a Python API for the Structure for Unifying Multiple Modeling Alternative (SUMMA) hydrologic model.

Assessing the Trustworthiness of Crowdsourced Rainfall Networks: A Reputation System...

Alexander Byron Chen

and 2 more

February 23, 2021

High resolution and accurate rainfall information is essential to modeling and predicting hydrological processes. Crowdsourced personal weather stations (PWSs) have become increasingly popular in recent years and can provide dense spatial and temporal resolution in rainfall estimates. However, their usefulness is limited due to a lack of trust in crowdsourced data compared to traditional data sources. Using crowdsourced PWSs data without an evaluation of its trustworthiness can result in inaccurate rainfall estimates as PWSs may be poorly maintained or incorrectly sited. In this study, we advance the Reputation System for Crowdsourced Rainfall Networks (RSCRN) to bridge this trust gap by assigning dynamic trust scores to the PWSs. Using rainfall data collected from 18 PWSs in two dense clusters in Houston, Texas USA as a case study, the results show that using RSCRN-derived trust scores can increase the accuracy of 15-min PWS rainfall estimates when compared to rainfall observations recorded at city’s high-fidelity rainfall stations. Overall, RSCRN rainfall estimates improved for 77% (48 out of 62) of the analyzed storm events, with a median RMSE improvement of 27.3%. Compared to an existing PWS quality control method, results showed that while 13 (21%) storm events had the same performance, RSCRN improved rainfall estimates for 78% of the remaining storm events (38 out of 49), with a median RMSE improvement of 13.4%. Using RSCRN-derived trust scores can make the rapidly growing network of PWSs a more useful resource for urban flood management, greatly improving knowledge of rainfall patterns in areas with dense PWSs.

Toward Forecasting Groundwater Table in Flood Prone Coastal Cities Using Long Short-t...

Benjamin Bowes

and 4 more

January 09, 2019

Coastal cities face recurrent flooding from storm events and rising seas. A contributing factor to flooding in these low relief areas is the groundwater table, which, already relatively shallow, can quickly rise towards the land surface during storm events. This leads to increased surface runoff entering stormwater drainage systems and a greater probability of flooding. As such, groundwater table forecasts could be an important component of real-time flood forecasting systems, but are generally unavailable. Because traditional physics-based models require extensive amounts of subsurface data that is difficult to obtain, especially in urban environments, this research evaluates two types of machine learning models, Recurrent Neural Networks (RNN) and Long Short-term Memory neural networks (LSTM), for creating groundwater table forecasts. The two types of networks were built with Tensorflow/Keras to forecast the groundwater table response to forecasted storm events and appropriate hyperparameters were tuned using the Hyperas library. Using observed hourly groundwater levels, rainfall, and tide from the City of Norfolk, Virginia, the networks were trained with data from 2010-2016 and tested with data from 2016-2018. Archived forecast rainfall and tide from two large storms in the test period (Hurricane Hermine and Tropical Storm Julia) were then used to evaluate the effect of forecast inputs on model performance. Results indicate that LSTM is slightly more accurate when forecasting the groundwater table than RNN, likely because of its increased ability to preserve and learn from past information. Average root mean squared error and Nash-Sutcliffe efficiency values for an 18hr forecast for the LSTM were 0.06m and 0.89, respectively, and 0.07m and 0.85, respectively, for the RNN. These forecasts could provide valuable information to aid in planning and response to storm events and will become an increasingly important part of effectively modeling and predicting coastal urban flooding as sea level rises.

HydroShare tools and recommended practices for sharing and publishing data and models...

David Tarboton

and 11 more

December 09, 2018

HydroShare is a domain specific data and model repository operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) to advance hydrologic science by enabling individual researchers to more easily share products resulting from their research. The community platform supports, not just the scientific publication summarizing a study, but also the data, models and workflow scripts used to create the scientific publication and reproduce the results therein. HydroShare accepts data from anybody, and supports Findable, Accessible, Interoperable and Reusable (FAIR) principles. HydroShare is comprised of two sets of functionality: (1) a repository for users to share and publish data and models, collectively referred to as resources, in a variety of formats, and (2) tools (web apps) that can act on content in HydroShare and support web based access to compute capability. Together these serve as a platform for collaboration and computation that integrates data storage, organization, discovery, and analysis through web applications (web apps) and that allows researchers to employ services beyond the desktop to make data storage and manipulation more reliable and scalable, while improving their ability to collaborate and reproduce results. This presentation will describe the capabilities developed for HydroShare to support the full research data management life cycle. Data can be entered into HydroShare as soon as it is collected, and initially shared only with the team directly working on the data. As analysis proceeds, tools, scripts and models that act on the data to produce research results may be stored in HydroShare resources alongside the data. At the time of publication these resources may be permanently published and receive digital object identifiers and cited in research papers. Resources may themselves include citations to the research papers, thereby linking the publications to the supporting data, scripts and models. HydroShare design choices and capabilities for establishing relationships and versioning, based on simplicity, and ease of use, and some of the challenges encountered, will be discussed.