EDUCATION University of California, Davis - Ph.D. Microbiology (2012) Dissertation Title: Exploring Microbial Community Composition and Genome Evolution Using Environmental and Comparative Genomics ##University of Texas, Arlington - M.S. Quantitative Biology (2001) Thesis Title: Worldwide Phylogeny of the Damselfly Genus Ischnura Based on Mitochondrial Cytochrome Oxidase II and Cytochrome B Sequence Data ##University of New Orleans - B.S. Biology (1995)
INTRODUCTION: Plant-associated microbial communities are critical to the health and evolutionary success of their hosts. Plant-associated microbes provide nutrients, break down environmental toxins and even play a role in protecting their hosts from disease (Zamioudis and Pieterse, 2011). Plants curate these communities for their advantage, encouraging beneficial microbes to colonize themselves and their immediate surroundings. In this way, they can shape otherwise inhospitable environments to their advantage and make welcoming ones even more fertile. Although these microbial interactions are well documented in terrestrial systems, few studies have focused on their role in marine systems. Seagrasses are the only known marine angiosperms that live fully submerged and have retained many of the physiological traits of land plants after their invasion of the marine environment (Wissler et al. 2011). For typical land plants, the marine environment poses significant challenges - it’s low in light and oxygen and high in toxins such as sulfides and salts. Yet despite these environmental challenges, seagrasses thrive in their marine homes and there is evidence to suggest that their microbial communities may play an important role in this success. For example, toxic sulfide levels in seagrass meadow sediment are controlled in part by the actions of sulfide-oxidizing bacteria associated with bivalves feeding off of the biomass and oxygen released from seagrass roots (van der Heide et al 2012). Seagrasses also play a fundamental ecological role in coastal communities (Wissler et al. 2011). Seagrass meadows anchor coastlines, provide refuges for tidal organisms, and help filter the water that flows through them. Perhaps most importantly, seagrass meadows contribute greatly to nutrient cycling and intertidal biogeochemistry. Among marine ecosystems, seagrass meadows have one of the highest levels of primary production both from the seagrass themselves, and from the microbial autotrophs that live in the water column (Williams et al. 2009). The seagrass propagate clonally producing dense underground networks of roots and rhizomes which microbes decompose to provide complex organic carbon, phosphates and bioavailable nitrogen to the meadow community (Larkum et al. 2007, Chapter 6). The meadows are also marine hot spots of nitrogen fixation (Welsh et. al. 2000) and intriguingly, nitrogen fixation in seagrass meadow sediments has been linked to the production of toxic sulfide (Larkum et al. 2007, Chapter 6) suggesting that there may be important trade-offs in the relationship between seagrasses and their nitrogen-fixing microbes. Despite much indirect evidence for the ecological importance of the microbial communities closely associated with seagrass, few studies have focused on the microbial communities as a whole and none, to our knowledge, have used culture-independent methods to extensively document these communities. To better explore these seagrass-microbe interactions we must first understand no only which microbes are intimately associated with seagrasses but also how that association changes on a micro-scale since microbial community composition may vary widely across very small distances. Thus, to begin answering these questions, we present a detailed survey of the micro-scale variation of the microbial communities associated _Zostera marina_ , a model seagrass endemic to the coastal regions of the northern hemisphere.
MOTIVATION A single microbial community can be composed of many thousands of species, and the tools most commonly used (pie charts and stacked bar graphs) to visualize the relative abundances of species in communities are inadequate. The human brain is not adept at estimating the areas of wedges in a pie or rectangles in a bar, and if it were, the color palette and graph size required to faithfully represent the relative abundances of thousands of species of even a single community would be prohibitively large. There is a great need to develop more intuitive visualization tools, especially for comparing microbial community composition across a large number of samples. Fortunately, human evolution, via natural selection has engineered a solution to this problem. The human brain has a region, the fusiform face area, that is entirely devoted to facial recognition. This region of the brain allows us to process a very complex image in an instant, requiring minimal decomposition into component parts. Instead, faces are perceived holistically, as a gestalt. Faces are infinitely variable, and we can quickly pick up on even very subtle differences and similarities between them.
BIG PICTURE QUESTIONS, MOTIVATION, RELEVANCE Various factors influence the patterns in the distribution and abundance of microbial community taxa in an environment. These dynamics in microbial community composition can sometimes result in large-scale population shifts such as extinction and recolonization of entire groups of taxa, or comparatively minor variations in the relative abundance of taxa in an environment. A robust microbial community can maintain equilibrium despite environmental fluctuations. Interestingly, large-scale changes in microbial community profiles across experimental conditions is often driven by a relatively small set of species. Identifying these key responders provides a method for understanding how microbe communities maintain equilibrium, and provides potential leverage for restoring balance to disturbed communities. Moreover, elucidating a mechanism for community robustness could be of great utility to many fields, including maintenance of organism health, ecosystem balance, and agriculture. Given that many host-microbe datasets share many commonalities regardless of the host species, identifying a general set of tools that characterize host-microbe community structure is crucial for understanding the basis of functions in community dynamics. Here we propose compiling a set of analysis tools and validating them on various pre-existing host-microbe datasets, as well as a benchmark dataset with host-microbe interactions that share commonalities across different systems. PROPOSAL Motivation While many analysis tools are currently available for the purpose of identifying key sets of taxa that correspond most strongly with changes in experimental condition, we believe a principled comparison and validation of these existing tools is still lacking. A good set of tools must work efficiently on nonparametric, large scale data; for instance, microbial connectivity patterns have high dimensionality and are often not normally distributed. In order to discard spurious results we need strong statistical models to identify the real dynamics of the community such as food webs and phylogenetic signals. We have repeatedly found that commonly used statistical corrections for high dimensionality are insufficient. We will need to explore models of different design for our data. We believe this project would particularly benefit from DSI guidance and resources in this regard. While we have experience with data exploration, identifying descriptive metrics for seemingly disparate data types, and algorithm implementation, we do not have a comprehensive understanding for using statistics for validation of high dimensional data. We would greatly appreciate any collaboration with the DSI to guide our comparison of analysis methods so that our findings can be believed and of use to anyone working in the field of microbial community analysis. Format and availability of data Our pipeline will be designed to begin with DNA that was extracted from the microbiomes of host organisms, sequenced for the 16S rRNA gene, and grouped into taxa. We currently have access to plant and bird host samples with lists of microbial taxa and their corresponding read counts. These raw datasets by themselves are small (on the order of 200 samples x 10000 microbe species); however, the combinatorics of taxa co-association result in high dimensionality. Outcomes We propose to identify the most promising algorithms in terms of time and space complexity, and implement them efficiently so they will be scalable. We will test our software and validate our models using pre-existing plant microbiome and bird microbiome datasets, as well as simulated data where the structure is already known. Pre-processing pipeline In order to accurately compare model outputs across bird, plant, and simulated datasets, we first need to implement quality control for pre-processing the raw data. DNA sequenced from environmental extractions is subject to poor coverage, or poor depth, resulting in read counts that may be assigned unevenly or spuriously across microbial taxa. An important quality control will be to identify samples with these poor coverage and sampling depth, and exclude them from the main analysis. Further, our pipeline will be designed to account for metadata associated with each sample for easy integration in order to correlate microbial population analysis with environmental traits. The resulting cleaned data will be the input for our models. We will make the pre-processing pipeline publicly available, with tutorials included. Algorithm pipeline There are three main approaches that we will explore in our analysis pipeline; namely, network analysis, matrix factorizations, and statistical validation. There are many applications in real life that can be naturally modeled as graphs. Microbial communities, given their metabolic interactions, naturally lend themselves for representation as networks. There are many ways to generate a network using our datasets and these networks can lead to various conclusions. For example, we can build a co-association networks for microbes by assigning microbes to be nodes and the interactions between them to be co-association patterns among them. We can also create sequence similarity networks for the microbes where edges show genetic relatedness. In the first case, clusters of microbes will indicate groups of microbes that co-occur together whereas in the second case, clusters will correspond to microbes that are phylogenetically similar and therefore are likely to have similar functionality. We propose to characterize network topology of multiple datasets using graphlets to illustrate how basis of co-association determines the network analysis conclusions. We will use network analysis as a tool to find patterns in microbial community and predict the functionality of each family of microbes. Matrix factorization methods are also commonly used to identify features of a dataset. In the case of microbial communities, a feature would be sets of microbial taxa that are particularly characteristic of a sample type and/or experimental condition. We propose to use matrix factorizations including PCA, NMF and SVD for the purpose of dimension reduction and feature extraction. We would like to identify the cases for which feature extraction by matrix factorization agrees with clusters identified by network analysis. Our ability to outline a method for statistical validation in this proposal is limited by our current ignorance of statistics. We would seek advice from the DSI in completing our validation design because this step is crucial for drawing meaningful conclusions. Publishing results and implementations All the data used, methods, results and their implementations will be available online for general use including the manual on how to use them. Jupyter notebooks as an interactive tool are commonly used for programming and data visualization and are a good candidate format for our purposes. We will publish our findings as a comparative methods paper preferably in an open-accessed journal. We anticipate this will be of great use for both biologists and computer scientist because it highlights both the implementation and application of many disparate methods. Currently, these methods have not been compared against each other and thus it is not known whether their conclusions are in agreement with one another. Timeline We would like this project to require one quarter. Briefly, the first 2 weeks will be budgeted for data wrangling, and assembling preprocessing scripts into Jupyter notebooks. The second 2 weeks will be algorithm implementation and generating first rounds of visualization for comparison between datasets. The second month will be devoted to statistical validation and verification methods, tool optimization, and code efficiency. In the third month, we will write the results for publication and post our code resources and data online. names of collaborators (+ resumes)
PROJECT SUMMARY (1 PAGE) Due: January 25, 2016 at 5 PM (local time) DEB - Biodiversity: Discovery & Analysis Cluster Solicitation: http://www.nsf.gov/pubs/2015/nsf15609/nsf15609.htm Cluster description: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503666&org=DEB&from=home Overview The broad goal with this proposal is to increase the overall knowledge of the true diversity of microbial eukaryotes by identifying and culturing microeukaryotes from seagrass beds. Microorganisms, and specifically marine microbial eukaryotes, represent an underexplored area of diversity. Microbial eukaryotes are known to be important on a number of trophic levels in the marine system CITE, and microbial eukaryotes found in seagrass beds likely contribute to their tremendous biodiversity and roles as important players in nutrient cycling and carbon sequestration in the oceans. We will use a combination of sequencing and culturing techniques to (1) characterize microeukaryotes in a global census of the seagrass _Zostera marina_, (2) Explore microbial eukaryotic diversity across the Order Alismatales, including the 3 separate lineages of seagrasses and their freshwater and brackish relatives, and (3) Create a publicly available culture collection of microbial eukaryotes from _Zostera marina_ samples from Bodega Bay, CA. Intellectual Merit Microorganisms comprise the majority of diversity on Earth. Traditionally classified using morphological approaches, the advent of sequence data has dramatically altered our views of microbial evolution and diversity. Specifically, high throughput sequencing technologies have enabled us to explore multiple genes and genomes from microorganisms, giving us insight into genome complexity and function in these unseen organisms. As a result microbial ecologists are finding themselves in uncharted territory as they analyze large data sets full of "unclassified" organisms, and it now clear that microorganisms are much more diverse than previously thought. Although certain pathogenic microeukaryotes have been studied in great detail (ex. _giardia_, see ) for review, environmental microeukaryotes, specifically marine microeukatyores, are grossly uncharacterized despite their important functional roles in their ecosystems . Novel marine microeukaryotic lineages have previously been found at all phylogenetic scales ; however, many of these novel organisms are still a mystery to us as they have yet to be cultured. It is estimated that the total diversity of microbial eukaryotes is much higher than what we currently have in culture . Seagrasses are a unique system in which to explore marine microbial eukaryotic diversity. These important marine angiosperms provide habitat and food to many rare and endemic species, and contain tremendous levels of biodiversity that has currently only been characterized at the macrobe level . Seagrasses are known to be important contributors to biogeochemical processes within the ocean and are one of the largest carbon sinks on earth, sequestering carbon 35X faster than Tropical Rainforests . Given their importance in the complex marine food web and their contributions to nutrient cycling within the oceans, we hypothesize that seagrass-associated marine microbial eukaryotes are important to both the high levels of macrobe biodiversity within seagrass beds and to their role in nutrient cycling and carbon sequestration in the ocean ecosystem. We propose to perform a global census of microbial eukaryotes found in association with the leaves, roots, and sediment of the seagrass _Zostera marina_. We will then expand our investigation to census the microbial eukaryotes found in association with plants across the Order Alismatales, which includes three independent lineages of seagrasses. Concurrently with the afformentioned censuses, we will establish a culture collection of microbial eukaryotes found associated with _Zostera marina_ from Bodega Bay, California. We are uniquely positioned to be successful at the proposed research; using funds provided by the Gordon and Betty Moore Foundation, we have already established a program to explore bacterial diversity within seagrass beds, and have completed the majority of field work and formed ongoing collaborations with other seagrass researchers from both the Zostera Experimental Network (ZEN) and other research institutions. Broader Impacts The project we propose here is a global interdisciplinary collaboration that will result in increased knowledge of the biodiversity of an understudied group of organisms from an important marine ecosystem. The preposed project is the first to explore seagrass-associated microbial eukaryotes using both sequence and culture based methods, and will generate large amounts of publicly available sequence data and numerous new entries of novel marine organisms to culture collections. The project we are proposing will include a large outreach component both at the local level (undergraduate researchers, high school students) and the global level (website, collaborators). Undergraduates and local high school students will be intimately involved in creating the culture collection and our progress will be transparently available on our lab website.