Brief literature review

  • Most genetic association studies focus on common variants.

  • But, rare genetic variants can play major roles in influencing complex traits. (Pritchard 2001, Schork 2009)

  • The rare susceptibility variants identified through sequencing have potential to explain some of the ’missing heritability’ of complex traits. (Eichler 2010).

  • However, standard methods to test for association with single genetic variants are underpowered for rare variants unless sample sizes are very large. (Lee 2014)

  • The lack of power of single-variant approaches holds in fine-mapping as well as genome-wide association studies.

  • In this report, we are concerned with fine-mapping a genomic region that has been sequenced in cases and controls to identify disease-risk loci.

  • A number of methods have been developed to evaluate the disease association for both single-variant and multiple-variants in a genomic region.

  • Besides single-variant methods, we consider three broad classes of methods for analysing sequence data: pooled-variant, joint-modelling and tree-based methods.

  • Overview of 3 types of analysis methods (Besides single-variant method)

    • Pooled-variant methods evaluate the cumulative effects of multiple genetic variants in a genomic region. The score statistics from marginal models of the trait association with individual variants are collapsed into a single test statistic, either by combining the information for multiple variants into a single genetic score or by evaluating the distribution of the pooled score statistics of individual variants. (Lee 2014)

    • Joint-modeling methods identify the joint effect of multiple genetic variants simultaneously. These methods can assess whether a variant carries any further information about the trait beyond what is explained by the other variants. When trait-influencing variants are in low linkage disequilibrium, this approach may be more powerful than pooling test statistics for marginal associations across variants (Cho 2010).

    • Tree-based methods.

      • A local genealogical tree represents the ancestry of the sample of haplotypes at each locus in the genomic region being fine-mapped.

      • Haplotypes carrying the same disease risk alleles are expected to be related and cluster on the genealogical tree at a disease risk locus.

      • Tree-based methods assess whether trait values co-cluster with the ancestral tree for the haplotypes (e.g., Bardel et al. 2005).

      • Mailund et al. 2006 has developed a method to reconstruct and score genealogies according to the case-control clusters.

    • Review Burkett et al. study briefly(!), what it found.

      • In practice true trees are unknown. However, cluster statistics based on true trees represent a best case for detecting association as tree uncertainty is eliminated.

      • Burkett et al. use known trees to assess the effectiveness of such a tree-based approach for detection of rare, disease-risk variants in a candidate genomic region under various models of disease risk in a haploid population.

      • They found that Mantel statistics computed on the known trees outperform popular methods for detecting rare variants associated with disease.

      • Following Burkett et al., we use clustering tests based on true trees as benchmarks against which to compare the popular association methods.

      • However, unlike Burkett et al., who focus on detection of disease risk variants, we here focus on localization of association signal in the candidate genomic region. Moreover, we use a diploid disease model instead of a haploid disease model.