Authorea

Introduction

Most genetic association studies focus on common variants, but rare genetic variants can play major roles in influencing complex traits.\cite{Pritchard_2001,Schork_2009}. The rare susceptibility variants identified through sequencing have potential to explain some of the ’missing heritability’ of complex traits \cite{Eichler_2010}. However, standard methods to test for association with single genetic variants are underpowered for rare variants unless sample sizes are very large \cite{Lee_2014}. The lack of power of single-variant approaches holds in fine-mapping as well as genome-wide association studies. #

In this report, we are concerned with fine-mapping a genomic region that has been sequenced in cases and controls to identify disease-risk loci. A number of methods have been developed to evaluate the disease association for both single-variant and multiple-variants in a genomic region. Besides single-variant methods, we consider three broad classes of methods for analysing sequence data: pooled-variant, joint-modelling and tree-based methods. Pooled-variant methods evaluate the cumulative effects of multiple genetic variants in a genomic region. The score statistics from marginal models of the trait association with individual variants are collapsed into a single test statistic, either by combining the information for multiple variants into a single genetic score or by evaluating the distribution of the pooled score statistics of individual variants \cite{Lee_2014}. Joint-modeling methods identify the joint effect of multiple genetic variants simultaneously. These methods can assess whether a variant carries any further information about the trait beyond what is explained by the other variants. When trait-influencing variants are in low linkage disequilibrium, this approach may be more powerful than pooling test statistics for marginal associations across variants \cite{Cho_2010}. A local genealogical tree represents the ancestry of the sample of haplotypes at each locus in the genomic region being fine-mapped. Haplotypes carrying the same disease risk alleles are expected to be related and cluster on the genealogical tree at a disease risk locus. Tree-based methods assess whether trait values co-cluster with the ancestral tree for the haplotypes (e.g., \citeNP{Bardel_2005}). \citeNP{Mailund_2006} has developed a method to reconstruct and score genealogies according to the case-control clusters.

In practice true trees are unknown. However, cluster statistics based on true trees represent a best case for detecting association as tree uncertainty is eliminated. Burkett et al. use known trees to assess the effectiveness of such a tree-based approach for detection of rare, disease-risk variants in a candidate genomic region under various models of disease risk in a haploid population. They found that Mantel statistics computed on the known trees outperform popular methods for detecting rare variants associated with disease. Following Burkett et al., we use clustering tests based on true trees as benchmarks against which to compare the popular association methods. However, unlike Burkett et al., who focus on detection of disease risk variants, we here focus on localization of association signal in the candidate genomic region. Moreover, we use a diploid disease model instead of a haploid disease model.

In this article, we compare the performance of selected rare-variant association methods for fine-mapping a disease locus. Our investigation focus on the localization of association signal to between \(950kbp−1050kbp\) within a \(2\)Mb candidate genomic region. To motivate our study, we use variant data simulated from coalescent trees. Our work on localization of association signal extends that of Burkett et al., which investigated the ability to detect association signal in the candidate region, without regard to localization. To illustrate ideas, we start by working through a particular example dataset as a case study for insight into selected association methods. we next perform a simulation study involving 200 sequencing datasets and score which association method localizes best, overall. Our results indicate that the potential of ancestral tree-based approach for localizing the association signal.