Jinko Graham edited untitled.tex  over 7 years ago

Commit id: b95fb891f09a9f45cf45f1b3df48efa9cd153b7b

deletions | additions      

       

\bigskip  We considered two tree-based methods: Blossoc (BLOck aSSOCiation, \cite{Mailund_2006}), and the Mantel test based on the rank of coalescent events \cite{Burkett_2013}.     Blossoc is a fast method to localize the risk variants by reconstructing genealogical trees at each SNV. This method approximates perfect phylogenies for each SNV, assuming an infinite-sites model of mutation, and scores them according to the non-random clustering of affected individuals. Genomic regions containing SNVs with high clustering scores are likely to harbour risk variants. Blossoc can be used for both phased and unphased genotype data. However, it is impractical to apply Blossoc to unphased data with more than few SNVs due to the computational burden associated with phasing. We therefore evaluated Blossoc with the phased haplotypes, using the probability-scores criterion for scoring SNVs, as recommendedby the user manual  for small datasets. datasets (IS THIS IN THE USER MANUAL? IF SO CAN YOU CITE PLEASE).    We also examined a Mantel test based on rank of the coalescent event \cite{Burkett_2013} to detect co-clustering of the disease trait and variants on genealogical trees. In practice, the true trees are unknown. However, the cluster statistics based on true trees represent a best case insofar as tree uncertainty is eliminated. A previous simulation study of Burkett et al. established the optimality of these clustering tests for detecting disease association in candidate genomic regions. We therefore included the rank-based Mantel statistic on true trees as a bench mark for comparison. This Mantel statistic upweights the short branches at the tip of the tree by assigning a branch-length of one to all branches, even the relatively longer branches that are close to the time to the most common ancestor. We determined the distance measure matrix, $d_{ij}=1-s_{ij}$, where $s_{ij} = (y_i-\mu)(y_j-\mu)$ is the similarity score between haplotype $i$ and $j$, $y_i$ is the phenotype ($0$, control;$1$, case) and $\mu$ is the disease prevalence in the 1500 simulated individuals. We then used the Mantel statistic to compare the phenotype-based distance matrix, $d$, with the tree-distance matrix for a tree with branchlengths of one.