Jinko Graham edited untitled.tex  over 7 years ago

Commit id: 940d520a87dc3585fb75420e78ede204ab464278

deletions | additions      

       

  Blossoc aims to localize the risk variants by reconstructing genealogical trees at each SNV. This method approximates perfect phylogenies for each SNV, assuming an infinite-sites model of mutation, and scores them according to the non-random clustering of affected individuals. The underlying idea is that genomic regions containing SNVs with high clustering scores are likely to harbour risk variants. Blossoc can be used for both phased and unphased genotype data. However, the method is impractical to apply to unphased data with more than few SNVs due to the computational burden associated with phasing. We therefore evaluated Blossoc with the phased haplotypes, using the probability-scores criterion for scoring SNVs, as recommended for small datasets (IS THIS IN THE USER MANUAL? CAN YOU CITE THE APPROPRIATE REFERENCE).    The Mantel test uses tests correlate the pairwise distances on the known ancestry with those in phenotypes. Following \citeNP{Burkett_2013}, we use  pairwise distances calculated from the rank of the coalescent event rather than the coalescence actual  times on the tree \cite{Burkett_2013}. tree.  The test  statistic upweights the short branches at the tip of the tree by assigning a branch-length of one to all branches, even the relatively longer branches that are close to the time to the most common ancestor. Pairwise distances between haplotypes on this re-scaled tree are then correlated to pairwise phenotypic distances. We determined the distance measure matrix, $d_{ij}=1-s_{ij}$, where $s_{ij} = (y_i-\mu)(y_j-\mu)$ is the similarity score between haplotype $i$ and $j$, $y_i$ is the binary phenotype (coded as $0$ or $1$) and $\mu$ is the disease prevalence in the 1500 simulated individuals. We then used the Mantel statistic to compare the phenotype-based distance matrix, $d$, with the re-scaled tree-distance matrix.