Authorea

Charith Bhagya Karunarathna edited subsection_Several_popular_methods_begin__.tex over 7 years ago

Commit id: 50cba06cbfdf55de15b359b49f9fcbc3f4a22d38

deletions | additions

\item We determined the distance measure matrix, $d_{ij}=1-s_{ij}$, where $s_{ij} = (y_i-\mu)(y_j-\mu)$ is the similarity score between haplotype $i$ and $j$, $y_i$ is the binary phenotype (coded as $0$ or $1$) and $\mu$ is the disease prevalence in the 1500 simulated individuals. We then used the Mantel statistic to compare the phenotype-based distance matrix, $d$, with the re-scaled tree-distance matrix. \item Note that we defined a 'phenotype' for each haplotype within an individual. Therefore, an individual has two phenotypes rather than one. \item Reconstructed Blossoc aims to localize the risk variants by reconstructing genealogical trees at each SNV (Blossoc, \citeNP{Mailund_2006}): A fast SNV. This method to localize the disease-causing variants. \begin{itemize} \item Approximates approximates perfect phylogenies for each site, SNV, assuming infinite site an infinite-sites model of mutation mutation, and scores them according to the non-random clustering of affected individuals. \item \citeNP{Mailund_2006} have found Blossoc to be a fast and accurate method to localize {\bf common} disease-causing variants but how well does it work The underlying idea is that genomic regions containing SNVs with rare variants? high clustering scores are likely to harbour risk variants. \item Can use either Blossoc can be used for both phased or and unphased genotype data. However, it the method is impractical to applyit to unphased data with more thana few SNPs SNVs due to the computational burden associated with phasing. \item We will thereform assume therefore assumed the SNV data are phased, as might be done in advance with a fast-phasing algorithm such as fastPHASE \cite{Scheet_2006}, BEAGLE \cite{Browning_2011}, IMPUTE2 \cite{Howie_2009} or MACH \cite{Li_2010,Li_2009}. \end{itemize} \item True trees (MT-rank of \cite{Li_2010,Li_2009}, and evaluated Blossoc with the coalescent events, \cite{Burkett_2014}): Detect co-clustering of phased haplotypes, using the disease trait and variants on genealogical trees. probability-scores criterion which is the recommended scoring scheme for small datasets \cite{Mailund_2006}. \begin{itemize} %[NOW CAN REMOVE: Expected number of time it takes for the final two of k lineages to coalesce is $ E(T_{2}) = 0.5 \times E(TMRCA) $. So, if we rank the coalescence events(i.e. intercoalescence times are 1 time unit), $ T_{2} $ becomes 1, as well as $T_{k}$ is one. So, this has the effect of upweighting the branch.]