For science to reliably support new discoveries, its results must be reproducible. This has proven to be a challenge in many fields including fields that rely on computational methods as a means for supporting new discoveries. Reproducibility in these studies is particularly difficult because they require open, documented sharing of data and models and careful control of underlying hardware and software dependencies so that computational procedures executed by the original researcher are portable and can be run on different hardware or software and produce consistent results. Despite recent advances in making scientific work more findable, accessible, interoperable and reusable (FAIR), fundamental questions in the conduct of reproducible computational studies remain: Can published results be repeated in different computing environments? If yes, how similar are they to previous results? Can we further verify and build on the results by using additional data or changing computational methods? Can these changes be automatically and systematically tracked? This presentation will describe our EarthCube project to advance computational reproducibility and make it easier and more efficient for geoscientists to preserve, share, repeat and replicate scientific computations. Our approach is based on Sciunit software developed by prior EarthCube projects which encapsulates application dependencies composed of system binaries, code, data, environment and application provenance so that the resulting computational research object can be shared and re-executed on different platforms. We have deployed Sciunit within the HydroShare JupyterHub platform operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) for the hydrology research community and will present use cases that demonstrate how to preserve, share, repeat and replicate scientific results from the field of hydrologic modeling. While illustrated in the context of hydrology, the methods and tools developed as part of this project have the potential to be extended to other geoscience domains. They also have the potential to inform the reproducibility evaluation process as currently undertaken by journals and publishers.
The most dynamic electromagnetic energy and momentum exchange processes between the upper atmosphere and the magnetosphere take place in the polar ionosphere, as evidenced by the aurora. Accurate specification of the constantly changing conditions of high-latitude ionospheric electrodynamics has been of paramount interest to the geospace science community. In response this community’s need for research tools to combine heterogeneous observational data from distributed arrays of small ground-based instrumentation operated by individual investigators with global geospace data sets, an open-source Python software and associated web-applications for Assimilative Mapping of Geospace Observations (AMGeO) are being developed and deployed (https://amgeo.colorado.edu). AMGeO provides a coherent, simultaneous and inter-hemispheric picture of global ionospheric electrodynamics by optimally combining diverse geospace observational data in a manner consistent with first-principles and with rigorous consideration of the uncertainty associated with each observation. In order to engage the geospace community in the collaborative geospace system science campaigns and a science-driven process of data product validation, AMGeO software is designed to be transparent, expandable, and interoperable with established geospace community data resources and standards. This paper presents an overview of the AMGeO software development and deployment plans as part of a new NSF EarthCube project that has started in September 2019.
The GeoSciFramework project (GSF), funded by the NSF Office of Advanced Cyberinfrastructure and NSF EarthCube programs, aims to improve intermediate-to-short term forecasts of catastrophic natural hazard events, allowing researchers to instantly detect when an event has occurred and reveal more suppressed, long-term motions of Earth’s surface at unprecedented spatial and temporal scales. These goals will be accomplished by training machine learning algorithms to recognize patterns across various data signals during geophysical events and deliver scalable, real-time data processing proficiencies for time series generation. The algorithm will employ an advanced convolutional neural network method wherein spatio-temporal analyses are informed both by physics-based models and continuous datasets, including Interferometric Synthetic Aperture Radar (InSAR), seismic, GNSS, tide gauge, and gas-emission data. The project architecture accommodates increasingly large datasets by implementing similar software packages already proven to support internet searches and intelligence gathering. This talk will focus primarily on the Differential InSAR (DInSAR) time-series analysis component, which quantifies line-of-sight (LOS) ground deformation at mm-cm spatial resolution. Here, we compare time series products generated under three different processing techniques. The first, an automated version of InSAR processing using the small baseline subset (SBAS) method performed in parallel on systems such as Generic Mapping Tool SAR (GMT5SAR) and the Generic InSAR Analysis Toolbox (GIAnT). The second method will resemble the first but will implement different processing systems for performance comparison using the InSAR Scientific Computing Environment (ISCE) and the Miami InSAR Time Series Software in Python (MintPy). The final strategy, developed by Drs. Zheng and Zebker from Stanford University, concentrates on the topographic phase component of the SAR signal so that simple cross multiplication returns an observation sequence of interferograms in geographic coordinates [Zebker, 2017]. Our results provide high-resolution views of ground motions and measure LOS deformation over both short and long periods of time.
The ICEBERG (Imagery Cyber-infrastructure and Extensible Building blocks to Enhance Research in the Geosciences) project (NSF 1740595) aims to (1) develop open source image classification tools tailored to high-resolution satellite imagery of the Arctic and Antarctic to be used on HPDC resources, (2) create easy-to-use interfaces to facilitate the development and testing of algorithms for application to specific geoscience requirements, (3) apply these tools through use cases that span the biological, hydrological, and geoscience needs of the polar community, and, (4) transfer these tools to the larger non-polar community.
The project develops innovative tools to extract and analyze the available observational and modeling data in order to enable new physics-based and machine-learning approaches for understanding and predicting solar activity and its influence on the geospace and Earth systems. The heliophysics data are abundant: several terabytes of solar and space observations are obtained every day. Finding the relevant information from numerous spacecraft and ground-based data archives and using it is paramount, and currently a difficult task. The scope of the project is to develop and evaluate data integration tools to meet common data access and discovery needs for two types of Heliophysics data: 1) long-term synoptic activity and variability, and 2) extreme geoeffective solar events caused by solar flares and eruptions. The methodology consists in the development of a data integration infrastructure and access methods capable of 1) automatic search and identification of image patterns and event data records produced by space and ground-based observatories, 2) automatic association of parallel multi-wavelength/multi-instrument database entries with unique patterns or event identifiers, 3) automatic retrieval of such data records and pipeline processing for the purpose of annotating each pattern or event according to a predefined set of physical parameters inferable from complementary data sources, and 4) generation of a pattern or catalog and associated user-friendly graphical interface tools that are capable to provide fast search, quick preview, and automatic data retrieval capabilities. The Team has developed and implemented the Helioportal that provides a synergy of solar flare observations, taking advantage of big datasets from the ground- and space-based instruments, and allows the larger research community to significantly speed up investigations of flare events, perform a broad range of new statistical and case studies, and test and validate theoretical and computational models. The Helioportal accumulates, integrates and presents records of physical descriptors of solar flares, as well as the magnetic characteristic of active regions from various catalogs of observational data from different observatories and heliophysics missions.
This repository creates a GUI (graphical user interface) for the BALTO (Brokered Alignment of Long-Tail Observations) project. BALTO is funded by the NSF EarthCube program. The GUI aims to provide a simplified and customizable method for users to access data sets of interest on servers that support the OpenDAP data access protocol. This interactive GUI runs within a Jupyter notebook and uses the Python packages: ipywidgets (for widget controls), ipyleaflet (for interactive maps) and pydap (an OpenDAP client). The Python source code to create the GUI and to process events is in a module called balto_gui.py that must be found in the same directory as this Jupyter notebook. Python source code for visualization of downloaded data is given in a module called balto_plot.py. This GUI consists of mulitiple panels, and supports both a tab-style and an accordion-style, which allows you to switch between GUI panels without scrolling in the notebook. You can run the notebook in a browser window without installing anything on your computer, using something called Binder. Look for the Binder icon below and a link labeled “Launch Binder”. This sets up a server in the cloud that has all the required dependencies and lets you run the notebook on that server. (Sometimes this takes a while, however.) To run this Jupyter notebook without Binder, it is recommended to install Python 3.7 from an Anaconda distribution and to then create a conda environment called balto. Simple instructions for how to create a conda environment and install the software are given in Appendix 1 of version 2 (v2) of the notebook.
The conduct of reproducible science improves when computations are portable and verifiable. A container provides an isolated environment for running computations and thus is useful for porting applications on new machines. Current container engines, such as Linux Containers (LXC) and Docker, however, have a high learning curve, are resource-intensive, and do not address the entire reproducibility spectrum consisting of portability, repeatability, and replicability. As part of EarthCube, we have developed Sciunit (https://sciunit.run) which encapsulates application dependencies i.e, system binaries, code, data, environment, along with application provenance. The resulting research object can be easily shared and reused amongst collaborators. Sciunit can be used with HydroShare’s JupyterHub CUAHSI notebook environment, and available to the entire community for use. In this poster, we will present three new features in Sciunit which have emerged based on community-provided use cases and discussion. Sciunit is available as a command-line utility. We will: (1) showcase the new Sciunit API. This will allow data facilities to integrate Sciunit as a reproducible environment on portals, (2) show how a Sciunit container can transition to a Docker container and vice versa, and finally, (3) demonstrate the ability to contrast two containers in terms of content and metadata. We will show these capabilities with the Hydrology use case of pySUMMA, a Python API for the Structure for Unifying Multiple Modeling Alternative (SUMMA) hydrologic model.
An NSF EarthCube-funded project supported a field-based workshop designed to evaluate and refine the sedimentology/stratigraphy portion of the StraboSpot digital data management system. Eleven academics attended the workshop, representing a spectrum of career levels and specialties. The participants teach classes in sedimentology and conduct sedimentary research, but had not used any previous digital mobile apps in the field. The field component focused on learning the basic functionality of the StraboSpot app as a method of collecting digital data in the field. On the first day, teams of 2-3 participants measured a stratigraphic section in a highly visited locality of the well-studied Book Cliffs of central Utah. Teams saw how the vocabulary and spot functionality worked to collect sedimentary field data and to generate stratigraphic columns. The second day was spent measuring a more complex mixed carbonate-clastic sequence in the San Rafael Swell (Utah). Half of the third day was spent in discussion on major issues with workflow/vocabulary and getting feedback on how to simplify and streamline descriptive data collection functions (stratal attributes), and reviewing the more challenging interpretation functions (processes, depositional environments, and architecture). A major discussion point was how best to handle data collection and stratigraphic plotting of ‘interbedded’ intervals. As a result of the workshop, we streamlined workflow options and refined portions of the vocabulary. This field testing followed up on two previous workshops that solicited expert advice to develop the program categories and basic vocabulary for the sedimentary community. Overall, workshop participants were enthusiastic about the potential of digital data systems, and the ability to link annotated photographs and sketches to georeferenced localities. All participants indicated they were inclined to use StraboSpot in both teaching and research, particularly with versatile and customizable options.
MagIC (earthref.org/MagIC (https://www2.earthref.org/MagIC)) is an organization dedicated to improving research capacity in the Earth and Ocean sciences by maintaining an open community digital data archive for rock and paleomagnetic data with portals that allow scientists and others to access to archive, search, visualize, download, and combine versioned datasets. A recent focus of MagIC has been to make our data more accessible, discoverable, and interoperable to further this goal. In collaboration with the GeoCodes/P418 group, we have continued to add more schema.org metadata fields to our data sets which allows for more detailed and deep automated searches. We are involved with the Earth Science Information Partners (ESIP) schema.org cluster which is working on extending the schema.org schema to the sciences. MagIC has been focusing on geo- science issues such as standards for describing deep time. We are also collaborating with the European Plate Observing System (EPOS)’s Thematic Core Service Multi-scale laboratories (TCS MSL). MagIC is sending its contributions’ metadata to TCS MSL via DataCite records for representation in the EPOS system. This collaboration should allow European scientists to use MagIC as an official repository for European rock and paleomagnetic data and help prevent the fragmenting of the global paleomagnetic and rock data into many separate data repositories. By having our data well described by an EarthCube supported standard (schema.org/JSON-LD), we will be able to more easily share data with other EarthCube projects in the future.
Geoscientists often spend significant research time identifying, downloading, and refining geospatial data before they can use it for analysis. Exploring interdisciplinary data is even more challenging because it may be difficult to evaluate data quality outside of one’s expertise. QGreenland, a newly funded EarthCube project, is designed to remove these barriers for interdisciplinary Greenland-focused research and analysis via an open data, open platform Greenland GIS tool. QGreenland will combine interdisciplinary data (e.g., glaciology, human health, geopolitics, hydrology, biology, etc.) curated by an international Editorial Board into a unified, all-in-one GIS environment for offline and online use. The package is designed for the open source GIS platform QGIS. QGreenland will include multiple levels of data use: 1) a fully downloadable base package ready for offline use, 2) additional disciplinary and/or high-resolution data extension packages for select download, and 3) online-access-only data to facilitate especially large datasets or updating time series. Software development has begun and we look forward to discussing techniques to create the best open access, reproducible methods for package creation and future sustainability. We also now have a beta version available for experimentation and feedback from interested users and the Editorial Board. The version 1 public release is slated for fall 2020, with two subsequent annual updates. As an interdisciplinary data package, QGreenland is designed to aid collaboration and discovery across fields. Along with discussing QGreenland development, we will also provide an example use case to demonstrate the potential utility of QGreenland for researchers, educators, planners, and communities.
Magnetics Information Consortium (MagIC), hosted at http://earthref.org/MagIC, is a database that serves as a Findable, Accessible, Interoperable, Reusable (FAIR) archive for paleomagnetic and rock magnetic data. It has a flexible, comprehensive data model that can accomodate most kinds of paleomagnetic data.The PmagPy software package is a cross-platform and open-source set of tools written in Python for the analysis of paleomagnetic data that serves as one interface to MagIC, accommodating various levels of user expertise. It is available through github.com/PmagPy. Because PmagPy requires installation of Python and the software package, there is a speed bump for many practitioners on beginning to use the software. In order to make the software and MagIC more accessible to the broad spectrum of scientists interested in paleo and rock magnetism, we have prepared a set of Jupyter notebooks, hosted on jupyterhub.earthref.org which serve a set of purposes. 1) There is a complete course in Python for Earth Scientists, and 2) a set of notebooks that introduce PmagPy (importing the software package from the github repository). These notebooks illustrate how to conduct statistical analyses, synthesize create data and create data visualizations of the type that are typically included in papers in the field. The notebooks also demonstrate how to prepare data from the laboratory for the MagIC database. This pathway gives additional tools to researchers so that they can satisfy data archiving requirements from NSF and publishers such as AGU.
Web 2.0 data delivery and visualization services have improved Earth system science workflows, yet scientists and researchers working with these applications require customized features that are not available on an application running on the browser. Tailoring Argovis’s data throughput so that users can gather data for their myriad tasks requires us to expose the underworkings of our Application Programming Interface (API). We provide a set of functions in a Jupyter notebook for users to retrieve Argo float profiles, platforms, metadata, spatial-temporal selections, and gridded products (including weather events) stored on Argovis. Charts and simple calculations made by the output of these functions provide users the means to write their python scripts. We have bundled the required libraries into a Docker container so that users do not need to install python libraries manually. All software dependencies are installed in the Docker container and run the notebooks within the docker environment. Instructions on how to build and run the container are included. We encourage users to improve, and expand these routines, and even extend them to other languages such as R, Matlab, or Julia, and share their work with us and the community. We welcome community feedback on these tutorial notebooks and are happy to support community-developed software on our platform.
To incentivize the participation and contribution to the growth of an earth-science-based cyberinfrastructure, analytical environments need to be developed that allow automatic analysis and classification of data from connected data repositories. The purpose of this study is to investigate a machine learning technique for automatically detecting shear-sense-indicating clasts (i.e., sigma or delta clasts and mica fish) in photomicrographs, and finding their shear sense (i.e., sinistral (CCW) or dextral (CW) shearing). Previous work employed transfer learning, a technique in which a pre-trained Convolutional Neural Network (CNN) was repurposed, and artificially augmented image datasets to distinguish between CCW and CW shearing. Preprocessing images by denoising, a process in which noise at different scales is removed while preserving edges of an image, improved classification accuracy. However, upon randomizing the denoising parameters, the CNN model didn’t converge due to severe lack of data. While the efforts for acquiring more labeled data is ongoing, this work compensated for it by implementing a pre-processing “detection” system that automatically crops images to regions of image containing the clasts. This is done by utilizing YOLOv3, a CNN based image detection system that outputs a bounding box around an object of interest. YOLOv3 was trained using 93 photomicrographs containing bounding boxes of 344 shear-sense-indicating clasts. The retrained detector was tested on two sets: set A with 10 photomicrographs containing clasts and set B with 100 photomicrographs not containing clasts. All but one of the clasts in set A were correctly detected with an average confidence score of 96.6%. On set B, 72% of images correctly did not indicate presence of clasts. On the remaining images, where clasts were incorrectly identified, an average confidence score of 78.3% was observed. By utilizing a threshold on the confidence scores, the system could be made more accurate. Future work involves utilizing the bounding boxes output by the detection system to refine and improve the CNN model for classifying shear sense of clasts in photomicrographs.
Ocean waves interact with the environment in many ways. They transport energy and mass, and the resultant sea-surface roughness defines the drag coefficients that transmits wind energy to the ocean (Drennan et al., 2003). Through erosion and deposition, waves change the shape and landscape of coastal areas. Storm surge waves can cause flood damage in coastal areas. Recent studies revealed that wetlands are sensitive to the wave condition, which determines the retreat or growth of coastal ecosystems (Green and Coco, 2007; Mariotti and Fagherazzi 2010). Human activities rely on the condition of waves to conduct marine activities such as fishing, shipping, oil extraction, and offshore constructions. Thus, it is important to understand ocean waves to improve earth system modeling, protect the coastline, predict storm surge, preserve coastal ecosystems, and enhance the offshore business. This project will explore the application of synthetic aperture radar (SAR) imagery to predict significant wave height near the coast. High-frequency (HF) radar data of the ocean (aka CODAR) was used as ground truth data set to calibrate and validate the wave height estimator. Off-shore wind data was also included. The developed code will enhance the current capability to process the satellite data and create a new platform to monitor the coastal environment. The collected data will help further our understanding of the wave spectrum in a coastal environment and the data can support other research in the related topics, e.g. the interaction of waves and ice sheets, wetlands, shorelines, wind farm and aquaculture.
The Earthcube Geosemantics Framework (https://ecgs.ncsa.illinois.edu/) developed a prototype of a decentralized framework that combines the Linked Data and RESTful web services to annotate, connect, integrate, and reason about integration of geoscience resources. The framework allows the semantic enrichment of web resources and semantic mediation among heterogeneous geoscience resources, such as models and data. This notebook provides examples on how the Semantic Annotation Service can be used to manage linked controlled vocabularies using JSON Linked Data (JSON-LD), including how to query the built-in RDF graphs for existing linked standard vocabularies based on the Community Surface Dynamics Modeling System (CSDMS), Observations Data Model (ODM2) and Unidata udunits2 vocabularies, how to query build-in crosswalks between CSDMS and ODM2 vocabularies using SKOS, and how to add new linked vocabularies to the service. JSON-LD based definitions pro- vided by these endpoints will be used to annotate sample data available within the IML Critical Zone Observatory data repository using the Clowder Web Service API (https://data.imlczo.org/). By supporting JSON-LD, the Semantic Annotation Service and the Clowder framework provide examples on how portable and semantically defined metadata can be used to better annotate data across repositories and services.
The growing scale of Earth and space science challenges dictate new modes of discovery–discovery that embraces cross-disciplinary interactions and links between communities, between data, between technologies. Nowhere is the challenge more pressing than in the field of Heliophysics where solar energy is generated, propagated through interplanetary space, interacts with the Earth’s space environment, and poses immediate threat to our technological infrastructure and human-natural systems (i.e., space weather). We will present a new project within the National Science Foundation Convergence Accelerator program that represents this new mode of discovery “The Convergence Hub for the Exploration of Space Science (CHESS).” Our approach is to semantically link Heliophysics data through a Knowedge Graph/Network (KG). The presentation and discussion will focus on: - What is a knowledge graph (KG)? - In what ways are KGs poised to transform Earth and space science? - The Convergence Hub for the Exploration of Space Science (CHESS) project and bridging to metadata and knowledge architecture efforts in Heliophysics We will highlight linkages to the NSF EarthCube program and ongoing efforts in the geoinformatics and data science communities across e.g., NSF, NOAA, and NASA.
The EarthCube Data Discovery Studio (DDStudio) integrates several technical components into an end-to-end data discovery and exploration system. Beyond supporting dataset search across multiple data sources, it lets geoscientists explore the data using Jupyter notebooks; organize the discovered datasets into thematic collections which can be shared with other users; edit metadata records and contribute metadata describing additional datasets; and examine provenance and validate automated metadata enhancements. DDStudio provides access to 1.67 million metadata records from 40+ geoscience repositories, which are automatically enhanced and exposed via standard interfaces in both ISO-19115 and in schema.org markup; the latter can be used by commercial search engines (Google, Bing) to index DDStudio content. For geoscience end users, DDStudio provides a custom Geoportal-based user interface which enables spatio-temporal, faceted, and full-text search, and provides access to additional functions listed above. Key project accomplishments over the last year include: - User interface improvements, based on design advice from a Science Gateways Community Institute (SGCI) usability team, who conducted user interviews, performed usability testing, and analyzed a dozen of other search portals to identify the most useful features. This work resulted in a streamlined user interface, particularly in presentation of search results and in management of thematic collections. - The earlier effort to publish DDStudio content using schema.org markup resulted in significant usage increase. With over 900K records indexed by Google, nearly half of the roughly 1000 unique users per month are now accessing DDStudio via referrals from Google. - The added ability to harvest and process JSON-LD metadata makes it possible to integrate EarthCube GeoCodes content into DDStudio, and work with this content using DDStudio’s user interface. - New application domains include joint work with the library community, and interoperation with DataMed, a similar system that indexes 2.3 million biomedical datasets.
Scientific ocean drilling through the International Ocean Discovery Program (IODP) and its predecessors, has a far-reaching legacy. They have produced vast quantities of marine data, the results of which have revolutionized many geoscience subdisciplines. Meta-analytical studies from these efforts exist for micropaleontology, paleoclimate, and marine sedimentation, and several outstanding resources have curated and made available elements of offshore drilling data, but much of the data remain heterogeneous and dispersed. Each study, therefore, requires reassembling a synthesis of data from numerous sources; a slow, difficult process that limits reproducibility and slows the progress of hypothesis testing and generation. A computer programmatically-accessible repository of scientific ocean drilling data which spans the globe will allow for large-scale marine sedimentary geology and micropaleontologic studies and may help stimulate major advances in these fields. The eODP project, funded through the NSF’s EarthCube program, seeks to facilitate access to and visualization of these large microfossil and stratigraphic datasets. To achieve these goals, eODP will be linking and enhancing three existing database structures: Open Core Data (OCD), the Paleobiology Database (PBDB), and Macrostrat. Over the next three years, eODP will be accomplishing the following goals: (1) enable construction of sediment-grounded and flexible age models in an environment that encompasses the deep-sea and continental records; (2) expand existing lithology and age model construction approaches in this integrated offshore-onshore stratigraphically-focused environment; (3) adapt key microfossil data into the PBDB data model from OCD; (4) develop new API-driven web user interfaces for easily discovering and acquiring data; and (5) establish user working groups for community input and feedback. This project is targeting shipboard drilling-derived data, but the infrastructure will be put in place to allow the addition of other shore-based information. The success of eODP hinges upon interaction, feedback, and contribution of the scientific ocean drilling community, and we invite anyone interested in participating in this project to join the eODP team.