ROUGH DRAFT authorea.com/18248

Abstract

# ABSTRACT

DNA methylation is an epigenetic mark that plays an inadequately understood role in gene regulation, particularly in non-model species. Because it can be influenced by the environment and potentially transferred to subsequent generations, DNA methylation may contribute to the ability of organisms to acclimatize and adapt to environmental change. We evaluated the distribution of gene body methylation in reef-building corals, a group of organisms facing significant environmental challenges. Gene body methylation in six species of corals was inferred from in silico transcriptome analysis of CpG O/E, an estimate of germline DNA methylation that is highly correlated with patterns of methylation enrichment. Consistent with what has been documented in most other invertebrates, all corals exhibited bimodal distributions of germline methylation suggestive of distinct fractions of genes with high and low levels of methylation. The hypermethylated fractions were enriched with genes with housekeeping functions, while genes with inducible functions were highly represented in the hypomethylated fractions. In three of the coral species, we found that genes differentially expressed in response to thermal stress and ocean acidification exhibited significantly lower levels of methylation. These results support a link between gene body hypomethylation and transcriptional plasticity that may point to a role of DNA methylation in the response of corals to environmental change.

# Introduction

As human influence on the planet expands, many organisms must acclimatize and adapt to rapid environmental change. Phenotypic plasticity facilitates a more rapid response to environmental change than is possible through natural selection, and will likely be critical to the persistence of many species (Charmantier ), (Chevin 2010). Phenotypic change often involves modifications in gene expression. Epigenetic mechanisms, involving alterations to the genome that do not affect the underlying DNA sequence, are increasingly recognized as some of the principal mediators of gene expression (Duncan 2014). The most researched and best understood epigenetic process is DNA methylation, which most commonly involves the addition of a methyl group to a cytosine in a CpG dinucleotide pair. The role of DNA methylation is best understood in mammals, where methylation in promoter regions has a repressive effect on gene expression (Jones 2001). In plants and invertebrates, methylation of gene bodies prevails, and is thought to be the ancestral pattern (Zemach 2010). Gene body methylation appears to have a range of functions, including regulating alternative splicing, repressing intragenic promoter activity, and reducing the efficiency of transcriptional elongation (Duncan 2014). Methylation of gene bodies also varies according to gene function, and studies on invertebrates indicate that highly conserved genes with housekeeping functions tend to be more heavily methylated than those with inducible functions (Roberts 2012), (Sarda 2012), (Dixon 2014), (Gavery 2014). This has led to speculation that gene body methylation may promote predictable expression of essential genes for basic biological processes, while an absence of methylation could allow for stochastic transcriptional opportunities in genes involved in phenotypic plasticity (Roberts 2012), (Dixon 2014), (Gavery 2014).

Direct relationships between DNA methylation and phenotypic plasticity are increasingly being established. Some examples include caste structure in honeybees and ants (Kucharski 2008), (Bonasio ), expression of the agouti gene in mice ((Wolff )), and the influence of prenatal maternal mood on newborn stress levels in humans (Oberlander ). In many cases, changes in methylation patterns can be attributed to external cues such as temperature, stress, or nutrition. A prime example is the honeybee Apis mellifera, where larval consumption of royal jelly induces changes in methylation that ultimately determine the developmental fate of an individual into a queen or a worker (Kucharski 2008). Thus, DNA methylation has been established as a key link between environment and phenotype.

Reef-building corals, the organisms that form the trophic and structural foundation of coral reef ecosystems, are known to display a significant degree of phenotypic plasticity (Todd ), (Forsman 2009), (Granados-Cifuentes ). As long-lived, sessile organisms, corals are thought to be particularly reliant on phenotypic plasticity to cope with environmental heterogeneity, because they must be able to withstand whatever nature imposes on them over long periods of time (Bruno 1997). As phenotypically flexible as they may be, corals’ longevity and immobility may also contribute to their vulnerability in a changing environment. Reef corals worldwide are experiencing severe declines due to a variety of anthropogenic effects, including climate change, ocean acidification, and a host of local stressors (Hoegh-Guldberg 2007). This has raised doubt concerning the ability of corals to survive coming decades. Yet there are also signs that, at least in some cases, corals possess sufficient resiliency to overcome their numerous challenges (Palumbi 2014). Recent studies on gene expression variation, for example, support the view that phenotypic plasticity in corals is robust and may provide resilience in the face of ocean warming (Barshis 2013), (Granados-Cifuentes ), (Palumbi 2014). However, the underlying basis of gene expression variation, and indeed phenotypic plasticity, remain largely unknown.

Evaluation of epigenetic processes therefore represents a logical next step in understanding coral gene expression and phenotypic variation. While recent annotation of the Acropora digitifera genome revealed a broad repertoire of genes involved in DNA methylation and other epigenetic processes (Dunlap 2013), to date, only one study has investigated possible roles of epigenetic processes in corals (Dixon 2014). Germline DNA methylation patterns in the transcriptome of Acropora millepora corroborated findings reported in studies of other invertebrate species (Dixon 2014). Most interestingly, genes that were differentially expressed in response to a common garden transplantation experiment were among the genes exhibiting lower levels of germline methylation (Dixon 2014). This finding further supports studies on invertebrates showing that hypomethylated genes tend to be those with inducible functions (Gavery 2010), (Sarda 2012).

Coral gene expression studies continue to expand, providing rich datasets to further probe the relationship between DNA methylation and gene function. In this study, we performed a comprehensive evaluation of germline methylation patterns in reef corals by examining the transcriptomes of six scleractinian coral species. Germline methylation levels in these data were inferred based on the hypermutability of methylated cytosines, which leads to a reduction in CpG dinucleotides over evolutionary time (Sved ). These data were then matched with gene ontology information, permitting evaluation of methylation patterns associated with broad categories of biological processes. Lastly, in three of the six species, we evaluated germline methylation patterns in genes involved in response to thermal stress and ocean acidification.

# Methods

Transcriptome data sources

The transcriptomes of six scleractinian coral species were evaluated to determine germline methylation patterns in relation to gene function and activity. Species examined included Acropora hyacinthus, A. millepora, A. palmata, Pocillopora damicornis, Porites astreoides, and Stylophora pistillata (Table 1). These transcriptomes reflect a range of life history stages. Some transcriptomes were developed from life history stages that had not yet been infected with symbiotic dinoflagellates (Symbiodinium spp.), while others used bioinformatic techniques to filter out putative Symbiodinium sequences. However, two of the transcriptomes (P. damicornis and S. pistillata) were developed from adult corals and did not remove putative symbiont sequences. We therefore applied a filtering step to these transcriptomes by comparing them to Symbiodinium clade A and B transcriptomes from (Bayer 2012) using blastn (version 2.2.29). An evalue threshold of of 10-5 was used for these queries, and all matched sequences were removed from further analyses. Details of blastn query and filtering are provided (https://github.com/jldimond/Coral-CpG-ratio-MS)

Table 1. Transcriptomes used in this study

Organism Life history stage Method No. Contigs Reference
Acropora hyacinthus Adult Illumina 33,496 Barshis et al. (2013)(Barshis 2013)
Acropora millepora Embryo to adult 454 & Illumina 52,963 Moya et al. (2012)(Moya 2012)
Acropora palmata Larval 454 88,020 Polato et al. (2011)(Polato 2011)
Pocillopora damicornis Adult Illumina 72,890 Vidal-Dupiol et al. (2013)(Vidal-Dupiol 2013)
Porites astreoides Adult 454 30,740 Kenkel et al. (2013)(Kenkel 2013)
Stylophora pistillata Adult 454 15,052 Karako-Lampert et al. (2014)(Karako-Lampert 2014)

 Illumina is Company - 454 is platform, several types of Illumina platforms.

Differentially expressed gene datasets

In addition to analyzing whole transcriptomes, we also examined genes differentially expressed in response to environmental stressors for the three acroporid species. For A. hyacinthus and A. millepora these gene sets were derived from the same studies that developed the reference transcriptomes ((Barshis 2013), (Moya 2012)), and for A. palmata differentially expressed genes sets were reported in (Polato 2013)(Polato Nicholas R.; Baums 2013) (Table 2).

Table 2. Differentially expressed gene sets examined

Organism Life history stage Method No. Contigs Environmental factor Reference
Acropora hyacinthus Adult sequencing 484 Thermal stress Barshis et al. (2013)(Barshis 2013)
Acropora millepora Juvenile sequencing 234 Ocean acidification Moya et al. (2012)(Moya 2012)
Acropora palmata Larval microarray 2002 Thermal stress Polato et al. (2013)(Polato 2013)(Polato Nicholas R.; Baums 2013)

Annotation

To maintain consistency in comparing datasets, all transcriptomes and differentially expressed gene sets were compared to the UniProt/Swiss-Prot protein database (version 2/17/2015) using Blastx (version 2.2.29) with an evalue threshold of 10-5. To further annotate genes with functional categories, corresponding Gene Ontology Slim (GOSlim) biological process were identified. Details of annotation are provided in jupyter notebooks in accompanying repository (https://github.com/jldimond/Coral-CpG-ratio-MS).

Predicted germline methylation

Germline methylation levels were inferred based on the hypermutability of methylated cytosines, which tend towards conversion to thymines over evolutionary time. This results in a reduction in CpG dinucleotides, meaning that heavily methylated genomic regions are associated with reduced numbers of CpGs. Thus, methylation patterns that have been inherited through the germline over evolutionary time can be estimated using the ratio of observed to expected CpG, known as CpG O/E. Germline DNA methylation estimated by analysis of CpG O/E is highly correlated with direct assays of methylation ((Suzuki 2007), (Sarda 2012), (Gavery 2013)). CpG O/E was defined as:

CpG O/E = (number of CpG / number of C x number of G) x (l2/l-1)

where l is the number of nucleotides in the contig.

Only annotated sequences were used for calculation of CpG O/E to increase the likelihood that sequences were oriented in the 5' to 3' direction. For subsequent analyses, we set minimum and maximum limits for CpG O/E at 0.001 and 1.5, respectively, to exclude outliers. Details of germline methylation prediction methods are provided in jupyter notebooks (https://github.com/jldimond/Coral-CpG-ratio-MS/)

Statistical analyses

Transcriptome CpG O/E patterns were fitted with the normalmixEM function in the mixtools package in the R statistical platform. Mixture models were evaluated against the null single component model by comparison of log-likelihood statistics. High and low CpG O/E components were delineated in mixture models using the intersection point of component density curves. For each GOSlim term, enrichment in the high and low CpG O/E components identified in mixture