Charith Bhagya Karunarathna edited untitled.tex  about 8 years ago

Commit id: 2cf9631ed4bb6ccc7cfbdfd25876790acd0f43a0

deletions | additions      

       

\section{Outline}  \begin{enumerate}  \item Introduction Inroduction  \begin{enumerate}  \item Area of Study study  \begin{itemize}  \item Association studies with sequence data  \item Sequence data includes both common and rare variants  \item About trees underlying the sequence data (where mutation occurs on tree)  \end{itemize}  \item Brief literature review: \begin{itemize} \item The rapid drop in sequencing cost and the availability of whole genome sequencing are enabling rapid advances in genomic research. \item Sequencing has potential to explain missing heritability through identification of trait associated rare variants. \cite{Eichler_2010}. \item Rare genetic variants can play major roles in influencing complex diseases. \cite{Pritchard_2001} \cite{Schork_2009} \item Standard methods to test for association with single genetic variants are underpowered for rare variants unless sample sizes are very large. \cite{Li_2008} \item Overview of 3 types of analysis methods (Besides single-variant approach)\begin{itemize}  \item Pooled-variant method combines the information across multiple variants sites within a gene. (can enrich the association signal).  \item Joint-modeling method identifies the joint effect of multiple genetic variants. This may be powerful since it uses combined information across variants.  \item Individuals carrying the same disease-predisposing variants are likely to inherit from the same ancestor. (Cases will tend to cluster together in the underlying genealogy). So, tree based method is an alternative grouping method based on relatedness.   \end{itemize}    \end{itemize}  \item Purpose of the study  \begin{itemize}  \item Fine map to localize (not just detect) the trait influencing mutations (causal SNVs) using sequencing data in 2Mb gene region.   \item Work through a particular example as a case study for insight into several popular methods for association mapping.  \item Simulate 100 datasets and score which method localizes best, overall.   \end{itemize}  \item Argue in the introduction that we've included true trees in the comparison, even though we won't know them in practice because in principle this should be the best result.  \end{enumerate}  \item Methods  \begin{enumerate}  \item Data simulation  \begin{enumerate}  \item Simulating the population  \begin{itemize}  \item fastsimcoal2 \cite{Excoffier_2011} \cite{Excoffier_2013}  \item Logistic regression model of disease status  \end{itemize}  \item Sampling case-control data  \end{enumerate}  \item Several popular methods  \begin{itemize}  \item Summary paragraph giving an overview of the different types of methods and the ideas motivating them.  \end{itemize}  \begin{enumerate}  \item Single-variant approach  \begin{itemize}  \item Fisher's exact test  \begin{itemize}  \item Individual variant sites are tested for an association with the disease outcome  \item $ 2\times 3 $ table constructed to compare genotype frequencies at each variant site in case controls.  \end{itemize}  \end{itemize}   \item Pooled-variant method  \begin{itemize}  \item VT \cite{Price_2010}: Variants with MAF below some threshold are likely to be more functional than the variants with higher MAF  \begin{itemize}  \item Suitable for effects in one direction.   \item High power to detect Pooled-variant method combines the information across multiple variants sites within a gene. (can enrich  the association between rare signal).  \item Joint-modeling method identifies the joint effect of multiple genetic variants. This may be powerful since it uses combined information across variants.  \item Individuals carrying the same disease-predisposing  variants and disease trait (need ref). are likely to inherit from the same ancestor. (Cases will tend to cluster together in the underlying genealogy). So, tree based method is an alternative grouping method based on relatedness.  \end{itemize}  \item C-alpha \cite{Neale_2011}: Test the variance of the effect size for across rare variants (No effect, increase or decrease risk).  \begin{itemize}  \item Sensitive to risk and protective variants in the same gene.  \item Powerful when the effects are in different directions  \end{itemize}  \end{itemize} \item Joint-modeling method  \begin{itemize}  \item CAVIARBF \cite{Chen_2015}}: Fine mapping method using marginal test statistics and LD in Bayesian framework; approximate Bayesian multivariate regression in BIMBAM.  \item Elastic-net \cite{Zou_2005}: A hybrid regularization and variable selection method that linearly combines L1 and L2 Purpose  of the Lasso and Ridge methods in multivariate regression.  \begin{itemize}  \item Useful when number of predictors greater than number of observations. (p>>n)  \end{itemize}  \end{itemize}  \item Tree-Based method study  \begin{itemize} \item Inferred trees (Blossoc, \cite{Mailund_2006}): A fast accurate method Fine map  to localize (not just detect)  thedisease-causing variants.  \begin{itemize}  \item Approximates perfect phylogenies for each site, assuming infinite site model of mutation and scores according to the non-random clustering of affected individuals.  \item Rely on phased haplotype data.  \end{itemize}  \item True trees (MT-rank of the coalescent events, \cite{Burkett_2013}): Detect the association between disease  trait and genealogical trees.  \begin{itemize}  \item Upweight the short branches at the tip of the tree.   \end{itemize}   \end{itemize}  \end{enumerate}  \item Success in localization was declared if the strongest signal was influencing mutations (causal SNVs) using sequencing data  in the risk region  \end{enumerate}  \item Results   \begin{enumerate}  \item Example dataset  \begin{enumerate}  \item Summary 2Mb gene region.   \item Work through a particular example as a case study  for population and sample  \item LD between rSNVs and others insight into several popular methods  for population  \item Single-variant statistics plot  \item Compare pooled variant statistics  \item Joint-modelling statistics  \item Between-class comparison  \end{enumerate} association mapping.  \item Simulate  100 datasets\end{enumerate}  \item Discussion  \begin{enumerate}  \item Review the purpose of the study  \item Evaluate the localization results on the test data  \begin{itemize}  \item Tree stats (both Blossoc  and known trees) are the only methods that successfully localize the association signal in the test data.  \item C-alpha shows higher signal than VT score  which is consistent with the findings of Burkett et al. method localizes best, overall.  \end{itemize} \item Evaluate Argue in  the localization results from introduction that we've included true trees in  the simulation study  \begin{itemize}  \item Put something here  \end{itemize}  \item MAIN CONCLUSIONS FROM THE ABOVE EVALUATIONS? THAT TREE_BASED STATS ARE THE WAY TO GO?   \item Limitations of comparison, even though we won't know them in practice because in principle this should be  the study  \begin{itemize}  \item Simple model of disease risk with additive effects and no covariates.  \end{itemize} best result.  \end{enumerate}  \end{enumerate}