Charith Bhagya Karunarathna edited subsection_Several_popular_methods_begin__.tex  over 7 years ago

Commit id: 50cba06cbfdf55de15b359b49f9fcbc3f4a22d38

deletions | additions      

       

\item We determined the distance measure matrix, $d_{ij}=1-s_{ij}$, where $s_{ij} = (y_i-\mu)(y_j-\mu)$ is the similarity score between haplotype $i$ and $j$, $y_i$ is the binary phenotype (coded as $0$ or $1$) and $\mu$ is the disease prevalence in the 1500 simulated individuals. We then used the Mantel statistic to compare the phenotype-based distance matrix, $d$, with the re-scaled tree-distance matrix.  \item Note that we defined a 'phenotype' for each haplotype within an individual. Therefore, an individual has two phenotypes rather than one.     \item Reconstructed Blossoc aims to localize the risk variants by reconstructing  genealogical trees at each SNV (Blossoc, \citeNP{Mailund_2006}): A fast SNV. This  method to localize the disease-causing variants.  \begin{itemize}  \item Approximates approximates  perfect phylogenies for each site, SNV,  assuming infinite site an infinite-sites  model of mutation mutation,  and scores them  according to the non-random clustering of affected individuals. \item \citeNP{Mailund_2006} have found Blossoc to be a fast and accurate method to localize {\bf common} disease-causing variants but how well does it work The underlying idea is that genomic regions containing SNVs  with rare variants? high clustering scores are likely to harbour risk variants.  \item Can use either Blossoc can be used for both  phased or and  unphased genotype data. However, it the method  is impractical to applyit  to unphased data with more thana  few SNPs SNVs  due to the computational burden associated with phasing. \item  We will thereform assume therefore assumed  the SNV data are phased, as might be done in advance with a fast-phasing algorithm such as fastPHASE \cite{Scheet_2006}, BEAGLE \cite{Browning_2011}, IMPUTE2 \cite{Howie_2009} or MACH \cite{Li_2010,Li_2009}.  \end{itemize}  \item True trees (MT-rank of \cite{Li_2010,Li_2009}, and evaluated Blossoc with  the coalescent events, \cite{Burkett_2014}): Detect co-clustering of phased haplotypes, using  the disease trait and variants on genealogical trees. probability-scores criterion which is the recommended scoring scheme for small datasets \cite{Mailund_2006}.  \begin{itemize}    %[NOW CAN REMOVE: Expected number of time it takes for the final two of k lineages to coalesce is $ E(T_{2}) = 0.5 \times E(TMRCA) $. So, if we rank the coalescence events(i.e. intercoalescence times are 1 time unit), $ T_{2} $ becomes 1, as well as $T_{k}$ is one. So, this has the effect of upweighting the branch.]