For science to reliably support new discoveries, its results must be reproducible. This has proven to be a challenge in many fields including fields that rely on computational methods as a means for supporting new discoveries. Reproducibility in these studies is particularly difficult because they require open, documented sharing of data and models and careful control of underlying hardware and software dependencies so that computational procedures executed by the original researcher are portable and can be run on different hardware or software and produce consistent results. Despite recent advances in making scientific work more findable, accessible, interoperable and reusable (FAIR), fundamental questions in the conduct of reproducible computational studies remain: Can published results be repeated in different computing environments? If yes, how similar are they to previous results? Can we further verify and build on the results by using additional data or changing computational methods? Can these changes be automatically and systematically tracked? This presentation will describe our EarthCube project to advance computational reproducibility and make it easier and more efficient for geoscientists to preserve, share, repeat and replicate scientific computations. Our approach is based on Sciunit software developed by prior EarthCube projects which encapsulates application dependencies composed of system binaries, code, data, environment and application provenance so that the resulting computational research object can be shared and re-executed on different platforms. We have deployed Sciunit within the HydroShare JupyterHub platform operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) for the hydrology research community and will present use cases that demonstrate how to preserve, share, repeat and replicate scientific results from the field of hydrologic modeling. While illustrated in the context of hydrology, the methods and tools developed as part of this project have the potential to be extended to other geoscience domains. They also have the potential to inform the reproducibility evaluation process as currently undertaken by journals and publishers.
Despite the proliferation of computer-based research on hydrology and water resources, such research is typically poorly reproducible. Published studies have low reproducibility due to incomplete availability of data and computer code, and a lack of documentation of workflow processes. This leads to a lack of transparency and efficiency because existing code can neither be quality controlled nor re-used. Given the commonalities between existing process-based hydrological models in terms of their required input data and preprocessing steps, open sharing of code can lead to large efficiency gains for the modeling community. Here we present a model configuration workflow that provides full reproducibility of the resulting model instantiations in a way that separates the model-agnostic preprocessing of specific datasets from the model-specific requirements that models impose on their input files. We use this workflow to create large-domain (global, continental) and local configurations of the Structure for Unifying Multiple Modeling Alternatives (SUMMA) hydrologic model connected to the mizuRoute routing model. These examples show how a relatively complex model setup over a large domain can be organized in a reproducible and structured way that has the potential to accelerate advances in hydrologic modeling for the community as a whole. We provide a tentative blueprint of how community modeling initiatives can be built on top of workflows such as this. We term our workflow the “Community Workflows to Advance Reproducibility in Hydrologic Modeling’‘ (CWARHM; pronounced “swarm”).
Forest cover and streamflow are generally expected to vary inversely because reduced forest cover typically leads to less transpiration and interception. However, recent studies in the western US have found no change or even decreased streamflow following forest disturbance due to drought and insect epidemics. We investigated streamflow response to forest cover change using hydrologic, climatic, and forest data for 159 watersheds in the western US from the CAMELS dataset for the period 2000-2019. Forest change and disturbance were quantified in terms of net tree growth (total growth volume minus mortality volume) and mean annual mortality rates, respectively, from the US Forest Service’s Forest Inventory and Analysis database. Annual streamflow was analyzed using multiple methods: Mann-Kendall trend analysis, time trend analysis to quantify change not attributable to annual precipitation and temperature, and multiple regression to quantify contributions of climate, mortality, and aridity. Many watersheds exhibited decreased annual streamflow even as forest cover decreased. Time trend analysis identified decreased streamflow not attributable to precipitation and temperature changes in many disturbed watersheds, yet streamflow change was not consistently related to disturbance, suggesting drivers other than disturbance, precipitation, and temperature. Multiple regression analysis indicated that although change in streamflow is significantly related to tree mortality, the direction of this effect depends on aridity. Specifically, forest disturbances in wet, energy-limited watersheds (i.e., where annual potential evapotranspiration is less than annual precipitation) tended to increase streamflow, while post-disturbance streamflow more frequently decreased in dry water-limited watersheds (where the potential evapotranspiration to precipitation ratio exceeds 2.35).
We compared snowfall, and snow water equivalent (SWE) accumulation and ablation simulations from the WRF-Hydro model with the U.S. National Water Model (NWM) configuration against observations at a set of representative point locations from Snow Telemetry (SNOTEL) sites across the western U.S. We focused on the model’s partitioning of precipitation between rain and snow and selected sites that span the variability of the percentage of rain on snow precipitation events. Our results show that the NWM generally under-estimates SWE and tends to melt snow earlier than observations in part due to errors in the precipitation and air temperature inputs. We reduced some of the observed and modeled discrepancies by using SNOTEL snow-adjusted precipitation and removing air temperature biases, based on observations. These input changes produced an average 59% improvement in the peak SWE. Modeled peak SWE was further improved using humidity-dependent rain-snow-separation. Both dew point and wet-bulb parameterizations were evaluated, with the dew-point parameterization giving better overall improvement, reducing the bias in SWE by 18% compared to the NWM air temperature-based scheme. This modification also improved melt timing with the number of site years having difference between modeled and observed date of half melt from peak SWE six or more days reduced by 6%. These SWE magnitude and timing improvements varied when analyzed for each rain-on-snow percentage class, with generally better results at sites where most precipitation events fall either as snow or as rain, and less improvement when there is a mix of snow and rain-on-snow events.
This study reports on the development and implementation of the HydroLearn online platform that supports active learning in the field of hydrology and water resources engineering. The platform is designed to serve the following two main purposes: to enable instructors to collaboratively develop and share active-learning resources, and to enhance student learning in fundamental and emerging topics in the field (e.g., rainfall-runoff processes, design of flood protection measures, flood forecasting, water-energy-food nexus). Using open-source technology, the HydroLearn platform supports customization of pre-developed learning modules and allows instructors to share components of their learning resources with other interested users. HydroLearn is inspired by the need to address challenges in adoption, scalability, and sustainability identified by research on educational innovations. HydroLearn utilizes research-based active learning methods (e.g., Problem-based Learning; Collaborative and Cooperative Learning) to create authentic online learning modules. The modules engage students in real-world hydrologic problems and provide unique opportunities to expose undergraduate students to modern hydrologic analysis tools that are at the forefront of hydrologic research and engineering practice. The platform includes tools that scaffold instructors’ implementation of sound pedagogical practices. The platform includes wizards and pre-populated templates on how to develop student-centered learning outcomes that ensure constructive alignment with the learning content. The platform also includes guidance for instructors on how to develop assessment rubrics to enhance student achievement through communicating the expected performance levels. The study will also share results on the implementation of a pilot learning module on flood protection. Thirty-six undergraduate students were surveyed before and after the implementation to determine their level of learning engagement. The survey measured their skills engagement, emotional engagement, participation, and performance engagement. The presentation will also report on efforts to engage the community through a fellowship program that aims to develop a network of educators who aspire to adopt active learning approaches and enhance hydrology education.
This study compares the U.S. National Water Model (NWM) reanalysis snow outputs to observed snow water equivalent (SWE) and snow-covered area fraction (SCAF) at SNOTEL sites across the Western U.S. SWE was obtained from SNOTEL sites, while SCAF was obtained from MODIS observations at a nominal 500 m grid scale. Retrospective NWM results were at a 1000 m grid scale. We compared results for SNOTEL sites to gridded NWM and MODIS outputs for the grid cells encompassing each SNOTEL site. Differences between modeled and observed SWE were attributed to both model errors, as well as errors in inputs, notably precipitation and temperature. The NWM generally under-predicted SWE, partly due to precipitation input differences. There was also a slight general bias for model input temperature to be cooler than observed, counter to the direction expected to lead to under-modeling of SWE. There was also under-modeling of SWE for a subset of sites where precipitation inputs were good. Furthermore, the NWM generally tends to melt snow early. There was considerable variability between modeled and observed SCAF as well as the binary comparison of snow cover presence that hampered useful interpretation of SCAF comparisons. This is in part due to the shortcomings associated with both model SCAF parameterization and MODIS observations, particularly in vegetated regions. However, when SCAF was aggregated across all sites and years, modeled SCAF tended to be more than observed using MODIS. These differences are regional with generally better SWE and SCAF results in the Central Basin and Range and differences tending to become larger the further away regions are from this region. These findings identify areas where predictions from the NWM involving snow may be better or worse, and suggest opportunities for research directed towards model improvements.
The era of "big data'' promises to provide new hydrologic insights, and open web-based platforms are being developed and adopted by the hydrologic science community to harness these datasets and data services. This shift accompanies advances in hydrology education and the growth of web-based hydrology learning modules, but their capacity to utilize emerging open platforms and data services to enhance student learning through data-driven activities remains largely untapped. Given that generic equations may not easily translate into local or regional solutions, teaching students to explore how well models or equations work in particular settings or to answer specific problems using real data is essential. This paper introduces an open web-based learning module developed to advance data-driven hydrologic process learning, targeting upper level undergraduate and early graduate students in hydrology and engineering. The module was developed and deployed on the HydroLearn open educational platform, which provides a formal pedagogical structure for developing effective problem-based learning activities. We found that data-driven learning activities utilizing collaborative open web platforms like HydroShare and CUAHSI JupyterHub computational notebooks allowed students to access and work with datasets for systems of personal interest and promoted critical evaluation of results and assumptions. Initial student feedback was generally positive, but also highlights challenges including trouble-shooting and future-proofing difficulties and some resistance to open-source software and programming. Opportunities to further enhance hydrology learning include better articulating the myriad benefits of open web platforms upfront, incorporating additional user-support tools, and focusing methods and questions on implementing and adapting notebooks to explore fundamental processes rather than tools and syntax. The profound shift in the field of hydrology toward big data, open data services and reproducible research practices requires hydrology instructors to rethink traditional content delivery and focus instruction on harnessing these datasets and practices in the preparation of future hydrologists and engineers.