Most genetic association studies focus on common variants.
The rare susceptibility variants identified through sequencing have potential to explain some of the ’missing heritability’ of complex traits. (Eichler 2010).
However, standard methods to test for association with single genetic variants are underpowered for rare variants unless sample sizes are very large. (Lee 2014)
The lack of power of single-variant approaches holds in fine-mapping as well as genome-wide association studies.
In this report, we are concerned with fine-mapping a genomic region that has been sequenced in cases and controls to identify disease-risk loci.
A number of methods have been developed to evaluate the disease association for both single-variant and multiple-variants in a genomic region.
Besides single-variant methods, we consider three broad classes of methods for analysing sequence data: pooled-variant, joint-modelling and tree-based methods.
Overview of 3 types of analysis methods (Besides single-variant method)
Pooled-variant methods evaluate the cumulative effects of multiple genetic variants in a genomic region. The score statistics from marginal models of the trait association with individual variants are collapsed into a single test statistic, either by combining the information for multiple variants into a single genetic score or by evaluating the distribution of the pooled score statistics of individual variants. (Lee 2014)
Joint-modeling methods identify the joint effect of multiple genetic variants simultaneously. These methods can assess whether a variant carries any further information about the trait beyond what is explained by the other variants. When trait-influencing variants are in low linkage disequilibrium, this approach may be more powerful than pooling test statistics for marginal associations across variants (Cho 2010).
A local genealogical tree represents the ancestry of the sample of haplotypes at each locus in the genomic region being fine-mapped.
Haplotypes carrying the same disease risk alleles are expected to be related and cluster on the genealogical tree at a disease risk locus.
Tree-based methods assess whether trait values co-cluster with the ancestral tree for the haplotypes (e.g., Bardel et al. 2005).
Mailund et al. 2006 has developed a method to reconstruct and score genealogies according to the case-control clusters.
Review Burkett et al. study briefly(!), what it found.
In practice true trees are unknown. However, cluster statistics based on true trees represent a best case for detecting association as tree uncertainty is eliminated.
Burkett et al. use known trees to assess the effectiveness of such a tree-based approach for detection of rare, disease-risk variants in a candidate genomic region under various models of disease risk in a haploid population.
They found that Mantel statistics computed on the known trees outperform popular methods for detecting rare variants associated with disease.
Following Burkett et al., we use clustering tests based on true trees as benchmarks against which to compare the popular association methods.
However, unlike Burkett et al., who focus on detection of disease risk variants, we here focus on localization of association signal in the candidate genomic region. Moreover, we use a diploid disease model instead of a haploid disease model.