Authorea

Christopher Medway edited Introduction.tex over 8 years ago

Commit id: 788e3ae24a29217ede4b60e18b070730e30118e1

deletions | additions

\usepackage{pdflscape} \section{Introduction} Alzheimer's disease (AD) is an incurable neurodegenerative disease and the most common form of dementia, affecting approximatly 750,000 850,000 people in the United Kingdom. Kingdom in 2014 (Alzheimer's Society (2014). Dementia 2014: An opportunity for change). AD typically manifests as insidious cognitive decline with episodic memory loss and, neuropathologically, as gross cortical atrophy of the temporal cortex. Neuronal neurofibrillary tangles of hyperphosphorylated tau protein and extracellular plaques of the amyloid$\beta$ peptide are classic hallmark of AD, which are required for difinitive diagnosis and provide conclusive evidence of LOAD post-mortem. Familial forms of AD are rare; manifesting early (early-onset Alzheimer's disease or EOAD) they are the result of highly penetrant, autosomal dominant mutations within genes on the 'amyloid' pathway (\textit{APP}, \textit{PSEN1} and \textit{PSEN2}). However, approximtley 95 percent of AD cases are of late-onset (Late-onset Alzheimer's disease or LOAD). Typically presenting after the fifth decade, LOAD is aeitiologically highly complex, involving multiple genetic and environmental risk factors. Although not a familial disease, it has been approximated that upwards of 60 percent of LOAD liability is genetic. This The first methodological approach to pay dividends was linkage analysis, which looks for segregating identifies genetic loci segregating with the disease phenotype withinfamily members affected with a given disease. family members. In 1993 ?? Corder et al identified a haplotype in the apolipoprotein-E gene (\textit{APOE}) on chromosome 19, which remains the strongest risk factor for LOAD to this day. One or and two copies of the $\epsilon$4 haplotype increases LOAD risk approximatly fourfold and sixteen fold sixteenfold respectively. However, despite this early success, it would be another twenty years until the next genetic risk factor for LOAD was discovered. In hind sight APOE was the low hanging fruit; due to substantial risk atypically large effect it imparts, imparts on a complex phenotype, APOE was uniquely amenable to a family-based linkage approach in a modest sample size. A fundermental change in approach, and technology, would be required to identify smaller genetic effects in unrelated samples. \section{The era of the genome-wide association study} The completion of the Human Geneome Project in 2003 ushered in a new era of genomics. The reference genome empowered enabled new initiatives to surge explore uncharted waters. The International HapMap Project (HapMap), launched in 2003, set its sights on determining the variability between individuels by genotyping genotyped 1.6 millions single nucleotide polymorphisms (SNPs)throughout hundreds of human genomes. Upon it's completion in 2005 the project had genotyped 1.6 millions SNPs in 1184 invididuels from 11 different ethnic populations [http://hapmap.ncbi.nlm.nih.gov] \cite{20811451}. \cite{20811451}, with the aim of understanding genetic variability between individueles. For the first time it wasnow possible to map the haplotype structure of the human genome and calculate linkage disequalibrium (LD) between SNPs. Using this information, the genetic variability within a 'gene of interest' could now be captured with a smaller number of 'tag-SNPs', vastly reducing experiment cost and time. *However, it was Highthroughput genetics became a reality when commercial microarray providers, Illumina providers (Affymetrix and Affymetrix, designing high-throughput arrays Illumina) began to use produce 'SNP chips' conatining hundreds of thousands of non-redundant non-redundent, informative SNPs capturing based on HapMap data. Octobre 2015 saw completeion of the 1000 Genomes Project (1KP) which, like HapMap, sought to catalouged human genetic variability across variability, albiet on a much finer scale. The combination of whole-genome (7.4x) and deep whole-exome (65.7x) sequencing in over 2500 samples has enabled a comprehensive catalouge of over 88 million variant sites to be discovered, including rare and structural changes. This has has several important applications, including designing arrays of rarer coding changes ('exome chips') and as a database to exclude pathogenic variants. However, it is the human genome ability impute GWAS datasets with a larger set of variants using 1KP reference hapolotypes that the the technology was born* - needs work. has been a game change \cite{26432245} . Octobre 2015 saw completeion of the 1000 Genomes Project (1KP). Like HapMap 1KP provided map of human genetic variability, albiet on a much finer scale. The combination of whole-genome (7.4x) and deep whole-exome (65.7x) sequencing seminal GWAS was published in over 2500 samples has enabled 2007 by the Wellcome Trust Case Controlm Consortium (WTCCC) \cite{17554300}. With a comprehensive catalouge series of over 88 million variant sites to be discovered, including rare 3,000 healthy controls and structural changes. This has has several important applications, including designing arrays of rarer coding changes ('exome chips') and as a database to exclude pathogenic variants. However, it is 14,000 combined cases across seven common human disease, the ability impute GWAS datasets consortium identified 24 novel genetic associations with a larger set of variants using 1KP reference hapolotypes that has been a game change \cite{26432245} . complex diseases. Theseminal GWAS was published in 2007 by the Wellcome Trust Case Controlm Consortium (WTCCC) \cite{17554300}. With a series of 3,000 healthy controls and 14,000 combined cases across seven common human disease, the consortium identified 24 novel genetic associations with diabetes (types I & II), coronary artery disease, Crohn's disease, rheumatoid arthritis and bipolar disorder. In order to obtain a sufficiently large case-control series for GWAS, previously independent genetic genetic began to form large consortia and user there combined resources to unearth genetic risk factors for complex human diseases. The era of the GWAS had arrived. According to the NHGRI-EBI GWAS catalogue, as of November 2015 there have been 2312 published GWAS \cite{24316577} \section{2009: The rebirth of Alzheimer's disease genetics} Early attempts to perform a GWAS in late-onset AD suffered from small sample numbers. As a result the early GWAS were insufficiently powered to detect any genetic risk factors other than the strong APOE association. However, in 2009, each armed with a case-control cohort of greater than 5,000 samples, two European consortia published three new genes in LOAD; \textit{CLU}, \textit{PICALM} and \textit{CR1} \cite{19734902}\cite{19734903}. This was swiftly followed by a fourth US led effort, \textit{BIN1}, in 2010 \cite{20460622}. Data pooling and meta-analysis between the US (ADGC) and European groups (GERAD) resulted in the discovery of a further five genes; \textit{ABCA7}, \textit{EPHA1}, \textit{MS4A} locus, \textit{CD2AP}, \textit{CD33} \cite{21460840} \cite{21460841}. The final traunch of genes came in 2013 as a result of international collaboration under the IGAP (International Genomics of Alzheimer's Disease Project) consortia; \textit{PTK2$\beta$}, \textit{SORL1}, \textit{HLA-DRB5/1}, \textit{SLC24A4}, \textit{CASS4}, \textit{CELF1}, \textit{ZCWPW1}, \textit{INPP5D}, \textit{MEF2C}, \textit{NME8} and \textit{FERMT2} \cite{24162737}. This was the largest LOAD GWAS to date (n=74,046) and, due to genotype imputation with 1000 Genomes Project reference haplotypes, testes over 7 million genetic variants genome-wide.