Image1
1. GOALS Genotype by environment interactions (G*E) are among the most important issues in crop production agriculture, particularly for plant breeding and genetics. Inability to quantify G*E interactions negatively affects crop management decisions and a plant breeder’s ability to predict the best variety for a target population of environments (TPE), which is the set of environments to which the improved crop varieties developed by breeding programs need to be adapted. In crop production the uncertainty of yield and quality (e.g. protein, oil and starch contents, and disease susceptibility), as measured by the spatial and temporal variance, affects crop management practices. The magnitude of the spatial and temporal variance will determine the appropriate scale of management. G*E investigations require researchers to access data from across many scientific disciplines (e.g., geosciences, atmospheric, genetics, etc) to characterize all of the controlling factors. Unfortunately, there are no simple means to discover all of the required data from all of the disciplines without considerable time and effort. In addition, once data are discovered, the researchers must dedicate significant time and further effort to access, manipulate and integrate the data into analysis software commonly used in the crop science community. However, plant breeders are not the only community with this problem, many if not most research communities attempting to resolve real-world problems face this situation, so it should not be surprising that in some cases solutions generated in one community may begin to fulfill another community’s needs. Such is the case here, where the proposers discovered through a chance meeting that the mechanisms created to provide common geoscience data sets of importance to the hydrology community provided exactly the variety of data that the plant breeder and crop production communities need to further G*E interactions research, if these data could be provided in the forms required by the tools commonly used in the plant breeder and crop community. In order to facilitate scientific discovery in plant genetics and breeding we propose to build upon that work and provide researchers in the plant genetics and plant breeding domains with data to support G*E studies by seamlessly and transparently integrating environmental and land surface characterization data and crop modeling systems into the new and evolving genomic analysis frameworks. This approach can then be extended to other agricultural research domains such as precision agriculture and integrated pest management. This in turn will allow genomic scientists to predict G*E, accelerating the scientific discoveries by streamlining the breeding and analysis of staple crops both in the US and in developing countries. We will take advantage of existing collaborations between the plant breeders, hydrologists, and data scientists involved in this proposed project to create learning based scenarios, coupling climate, environmental and genetic data to predict crop range and response. These scenarios will be demonstrated during training activities at end user conferences such as at the Plant and Animal Genome conference and the American Society of Agronomy meetings. This G*E cyberinfrastructure resulting from this project will improve a plant breeder’s ability to predict the best plant varieties for a TPE. This project will simplify the access and manipulation of large environmental datasets (Table 1) required to characterize the ambient conditions in which plants develop at specific locations and times and extend this information to other non-sampled locations by inclusion of environmental covariates, provided by the cyberinfrastructure, in plant breeders crop models. To create this environment in a cross-disciplinary setting, we propose an agile development cycle that incorporates feedback from plant breeders and geneticists throughout the project development cycle to develop easy-to-use computational solutions for the community. 2. PROJECT SIGNIFICANCE The NSF funded Building Blocks Brokering (BCube) project (in which PI Easton and co-PI Duerr were participants) demonstrated that the discovery, access, and transformation of geo-environmental data to the necessary formats for cross domain use can be significantly simplified, thereby decreasing the amount of time researchers spend on data discovery, access and manipulation. In particular, the BCube hydrology use case, currently in production, demonstrated success in making available previously inaccessible and disparate data by coupling external non-geo standards based datasets into systems that hydrologists use, from commercial offerings like ArcGIS, ArcHydro, and MATLAB, to analytical packages and languages like R and Python. Building on that platform we propose to extend it to provide a cross domain data system for the plant breeding and genetics community. We will demonstrate this cyberinfrastructure for several plant breeding/genetics science use cases common in G*E investigations by developing easy to use data accessors (Table1). Integrating these methods in G*E studies will significantly decrease the labor involved in data discovery, access and manipulation, and as a result significantly accelerate G*E-related scientific discoveries in the plant breeding and genetics field.