Identification of allelic-specific methylation profiles across generations in the Norfolk Island genetic isolate


Miles Benton(a), Rod Lea(a), Nicole White(b), Daniel Kennedy(b), Heidi Sutherland(a), Larisa Haupt(a), Kerrie Mengersen(b) and Lyn Griffiths(a)


(a) Genomics Research Centre, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Queensland, Australia (b) ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology (QUT), Brisbane, Queensland, Australia


DNA methylation is an important epigenetic mechanism that can contribute to variation in gene expression and complex traits. Characterising the genome-wide diversity of DNA methylation and understanding allele-specific inheritance patterns across generations is the next frontier in human genetics. The use of genetic isolates provides an innovative natural setting for investigating DNA methylation because of large multi-generational pedigrees and reduced genetic and environmental diversity. The Norfolk Island (NI) population is one such genetic isolate. Located off the east coast of Australia, the original population was founded by 11 British mutineers of the HMS Bounty and 6 Polynesian women in the late 1700s. These founders have given rise to a 6000-member pedigree spanning 11 generations. We are measuring genome-wide allele-specific methylation using NGS bisulphite sequencing (CpGiant with Illumina HiSeq). To date we have collected data for 24 individuals comprising a close 3 generation pedigree. Data was analysed using a custom pipeline incorporating Methpipe software for calling allele-specific methylation (ASM). Results of this analysis identified 1.12M CpG sites common across all samples. Of these 257992 CpG sites showed ASM. Using a custom clustering method we identified ~1800 ASM regions (AMRs) conserved within the pedigree. Many of these AMRs map to known and predicted imprinted genes. Interestingly, there were numerous ASM peaks in the NI family that have not been previously identified as imprinted loci. For example, we observed a large AMR peak on Chr 21 with no previous imprinting information and very little SNP annotation. This peak also demonstrated 'conservation' of ASM signal across all samples, and looks to be in a 50:50 split - typical of imprinting. This finding provides compelling evidence that our novel approach has potential to identify new imprinted genes as well as allow us to characterise trans-generational epigenetic inheritance patterns that might influence complex traits in humans.


An overview of ASM

One of the first studies showing ASM distributed across the human genome (Schalkwyk 2010)

Allele-specific DNA methylation: beyond imprinting (Tycko 2010)

Human body epigenome maps reveal noncanonical DNA methylation variation (Schultz 2015)

Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells (Guo 2013)

NGS associating SNPs with allelic specific methylation (Hu 2013)

Methods of identification

ASM from GWAS data

Generating ASM information from GWAS data has been illustrated (Schalkwyk 2010, Tycko 2010a).

Further use of SNP array's to infer ASM:

Also an example of using 450k data (Docherty 2014)

Early work (cHIP seq etc)

  • Chromosome-Wide Analysis of Parental Allele-Specific Chromatin and DNA Methylation (Singh 2011)

Family based (using RRBS data)

  • Analysis of dna methylation in a three-generation family reveals widespread genetic influence on epigenetic regulation (Gertz 2011)

BS sequence data

With the arrival of next-generation sequencing we have at our disposal a platform capable of high-throughput generation of bisulfite treated DNA for the first time. There are a few documented tools for analysis of this data in search of ASM.

Another early paper addressing ASM from BS data: Detection of allele-specific methylation through a generalized heterogeneous epigenome model (Peng 2012)

Methpipe (Song 2013)

One of the seminal papers:

SMAP (Gao 2015)

  • another pipeline
  • this one takes into account 'raw' read data to determine ASM
  • BUT I think there might be issues with this method in 'normal' tissue/cell samples as they state that:

"Amrfinder is a popular ASM detect tool in which a statistical model is implemented to detect ASM [21]. However, the model applies only to the simple case of monoclonal cells with equal allele frequencies, which is not suitable for more complex cancer cases. In SMAP, heterozygous SNPs are used to determine alleles in two strands."

SNPsplit (Krueger 2016)

  • looks like a really nice solution but thus far only working on mouse genomics
  • I believe it wouldn't be too hard to work this tool into a human setting