The genetics of Alzheimer’s disease; what has GWAS ever done for us?
Alzheimer’s disease (AD) is an incurable neurodegenerative disease and the most common form of dementia, affecting approximately 850,000 people in the United Kingdom (Janssen 1992) (Alzheimer’s Society 2014. Dementia 2014: An opportunity for change). AD typically manifests as insidious cognitive decline with episodic memory loss and, neuropathologically, as gross cortical atrophy of the temporal cortex. Neuronal neurofibrillary tangles of hyperphosphorylated tau protein and extracellular plaques of the amyloid\(\beta\) peptide are classic hallmark of AD, and provide definitive evidence of LOAD post-mortem.
Familial forms of AD classically present early (early-onset Alzheimer’s disease or EOAD) and are rare. EOAD is the result of highly penetrant, autosomal dominant mutations within genes on the ’amyloid’ pathway (APP, PSEN1 and PSEN2). However, approximately 95 percent of AD cases are of late onset (late onset Alzheimer’s disease or LOAD). Typically presenting after the fifth decade, LOAD is aetiologically highly complex, involving multiple genetic and environmental risk factors. Although not a familial disease, it has been approximated that upwards of 60 percent of LOAD liability is genetic (Van 2015).
The first methodological approach to pay dividends was linkage analysis, which identifies genetic loci segregating with a disease phenotype between affected family members. In 1993 Corder et al identified a haplotype in the apolipoprotein-E gene (APOE) on chromosome 19, which remains the strongest risk factor for LOAD to this day (Corder 1993). One and two copies of the \(\epsilon\)4 haplotype increases LOAD risk approximately fourfold and sixteenfold respectively. However, despite this early success, it would be another twenty years until the next genetic risk factor for LOAD was discovered. In hind sight APOE was the low hanging fruit; due to the atypically large effect it imparts on a complex phenotype, textitAPOE was uniquely amenable to a family-based linkage approach in a modest sample size. A fundamental change in approach, and technology, would be required to identify smaller genetic effects in unrelated samples.
The completion of the Human Geneome Project in 2003 ushered in a new era of genomics. Equipt with a reference genome new initiatives could explore uncharted waters. The International HapMap Project (HapMap), launched in 2003, genotyped 1.6 millions single nucleotide polymorphisms (SNPs) in 1184 samples from 11 different ethnic populations, with the aim of understanding genetic variability between individuels [http://hapmap.ncbi.nlm.nih.gov] (Altshuler 2010). For the first time it was possible to map the haplotype structure of the human genome and calculate linkage disequalibrium (LD) between SNPs. Using this information, the genetic variability within a ’gene of interest’ could now be captured with a smaller number of ’tag-SNPs’, vastly reducing experiment cost and time.
Octobre 2015 saw completion of the 1000 Genomes Project (1KP) which, like HapMap, sought to catalogued human genetic variability, albeit on a much finer scale (Auton 2015). The combination of whole-genome (7.4x) and deep whole-exome (65.7x) sequencing in over 2500 samples has enabled a comprehensive catalogue of over 88 million variant sites to be discovered, including rare and structural changes. This has has several important applications, including designing arrays of rarer coding changes (’exome chips’) and providing a reference set of haplotypes for genotype imputation.
Genome-wide association studies (GWAS) became a reality when commercial microarray providers (Affymetrix and Illumina) began to use produce ’SNP chips’ containing hundreds of thousands of non-redundant, informative polymorphisms based on HapMap data (Figure 1). The seminal GWAS was published in 2007 by the Wellcome Trust Case Control Consortium (WTCCC) (Genome-wide associati...). With a series of 3,000 healthy controls and 14,000 combined cases across seven common human disease, the consortium identified 24 novel genetic associations with complex diseases using the Affymetric 500K GeneChip. Realising that large sample large sample sizes were key, research groups in other fields began to form larger consortia in order to pool their resources. The era of GWAS had arrived.
Early attempts to perform a GWAS of LOAD suffered from small sample numbers and were insufficiently powered to detect genetic risk factors other than the strong APOE association. Finally, in 2009 two European consortia, each with a case-control series of over 5000 samples, published three new genes in LOAD; CLU, PICALM and CR1 (Harold 2009)(Lambert 2009). This was swiftly followed by a fourth US led effort, BIN1, in 2010 (Seshadri 2010). Data pooling and meta-analysis between US and European consortia resulted in the discovery of a further five genes; ABCA7, EPHA1, MS4A locus, CD2AP, CD33 (Hollingworth 2011) (Naj 2011). The final traunch of genes came in 2013 as a result of international collaboration under the IGAP (International Genomics of Alzheimer’s Disease Project) consortia; PTK2\(\beta\), SORL1, HLA-DRB5/1, SLC24A4, CASS4, CELF1, ZCWPW1, INPP5D, MEF2C, NME8 and FERMT2 (Lambert 2013). This was the largest LOAD GWAS to date (n=74,046) and, due to genotype imputation with 1000 Genomes Project reference haplotypes, over 7 million genetic variants were tested.
In the space of only five years GWAS has given the field twenty new genetic loci.
l || l | l | l | c Study & Sample Size & SNPs Genotyped & Genes Discovered & Reference
Harold et al (2009) & 1000 & 1 million & 2:CLU,PICALM &
Lambert et al (2009)& 1000 & 1 million & 2:CLU,CR1 &
Seshadri et al (2010) & 1000 & 1 million & 1:BIN1 &
Naj et al (2011) & 1000 & 1 million & ABCA7,MS4A6A/4E,EPHA1,CD2AP,CD33 &
Hollingworth et al (2011) & 1000 & 1 million & ABCA7,MS4A6A/4E,EPHA1,CD2AP,CD33 &
Lambert et al (2013) & 1000 & 1 million & PTK2\(\beta\),SORL1, FERMT2, 4, CELF1, HLA-DR1, NME8 & 14