Graham McVicker edited Correcting for incorrect genotype calls.tex  almost 10 years ago

Commit id: afe0ff7616a1cd2e8ec48d5ae9b64b44c06e982b

deletions | additions      

       

\subsection{Correcting for incorrect genotype calls}  SNP genotypes that are incorrectly called as heterozygous are a major source of false positives, since reads that overlap them appear to come from only one allele. To account for this issue, we assume that allele specific reads are drawn from a mixture of two beta-binomials, with probabilities $H_{ik}$ and $1-H_{ik}$, where $H_{ik}$ is the probability that individual $i$ is heterozygous for SNP $k$. Reads from heterozygous individuals contain the reference allele with probability $p_{h}$. We assume that reads from homozygous individuals still have a small probability of coming from the other allele due to sequencing errors, which occur with probability, $p_{\textrm{err}}$. The likelihood probability  of observing $y_{ik}$ reads from the reference allele at SNP $k$ for individual $i$ then becomes: \begin{eqnarray*}  & \Pr_{\mathrm{BB-mix}}\left(Y_{ik} = y_{ik} \left| p_{h}, n_{ik}, \Upsilon_i \right. \right) =