Charith Bhagya Karunarathna edited untitled.tex  over 7 years ago

Commit id: cafd8464ac85eb6859dd53f263b0a82c5c763993

deletions | additions      

       

\bigskip  We considered two methods to assess clustering of disease status in genealogical trees: Blossoc (BLOck aSSOCiation, \cite{Mailund_2006}), which uses reconstructed trees, and a Mantel test which uses the true trees. In practice, the true trees are unknown. However, the cluster statistics based on true trees represent a best case insofar as tree uncertainty is eliminated \cite{Burkett_2013}. We therefore include two versions of the Mantel test as a benchmark for comparison. In the first version, the phenotype is scored according to whether or not the haplotype comes from a case. We refer to the first version as the {\em naive} Mantel test because all case haplotypes are treated the same, even those not carrying any risk variants. In the second version, the phenotype is scored according to whether or not the haplotype comes from a case and carries a risk variant. We refer to the second version as the {\em informed} Mantel test because it takes into account whether or not a case haplotype carries a risk variant.    Blossoc aims to localize the risk variants by reconstructing genealogical trees at each SNV. This method approximates perfect phylogenies for each SNV, assuming an infinite-sites model of mutation, and scores them according to the non-random clustering of affected individuals. The underlying idea is that genomic regions containing SNVs with high clustering scores are likely to harbour risk variants. Blossoc can be used for both phased and unphased genotype data. However, the method is impractical to apply to unphased data with more than few SNVs due to the computational burden associated with phasing. We therefore evaluated Blossoc with the phased haplotypes, using the probability-scores criterion for scoring SNVs, as which is the  recommended scoring scheme  for small datasets \cite{Mailund_2006}.   The Mantel tests correlate the pairwise distance in the known ancestry with those in the phenotypes. Following \citeNP{Burkett_2013}, we use pairwise distances calculated from the rank of the coalescent event rather than the actual times on the tree. The test statistic upweights the short branches at the tip of the tree by assigning a branch-length of one to all branches, even the relatively longer branches that are close to the time to the most common ancestor. Pairwise distances between haplotypes on this re-scaled tree are then correlated to pairwise phenotypic distances. We determined the distance measure matrix, $d_{ij}=1-s_{ij}$, where $s_{ij} = (y_i-\mu)(y_j-\mu)$ is the similarity score between haplotype $i$ and $j$, $y_i$ is the binary phenotype (coded as $0$ or $1$) and $\mu$ is the disease prevalence in the 1500 simulated individuals. We then used the Mantel statistic to compare the phenotype-based distance matrix, $d$, with the re-scaled tree-distance matrix.