Authorea

Graham McVicker edited Correcting for incorrect genotype calls.tex almost 10 years ago

Commit id: 0e503ca1b783cc5f3581e10d23d25394c3f14d68

deletions | additions

+ \Pr_{\mathrm{BB}} \left(Y_{ik} = y_{ik} \left| 1-p_{\textrm{err}}, n_{ik}, \Upsilon_i \right. \right) \right] & \end{eqnarray*} We found that even SNPs with heterozygous probabilities of 1.0 are occasionally miscalled so we set heterozygous probabilities to a maximum value of 0.99. We then update this heterozygous probability using sequencing data obtained from the same individual. Sequencing data may consist of DNA sequence sequencing reads or readsthat are aggregated acrossfrom multiple types of experiments performed on the same individual (e.g. RNA-seq and ChIP-seq reads). For a SNP with heterozygous probability $H_{ik} = \min(0.99, H_{ik}^{\textrm{obs}})$, we define the updated heterozygous probability, $\hat{H}_{ik}$ as: \[ \hat{H}_{ik} = \frac{H \Pr_{\mathrm{Bin}} \left( D \left| p=0.5 \right. \right)} {H_{ik} \Pr_{\mathrm{Bin}} \left( D \left| p=0.5 \right. \right) + (1 - H_{ik}) \left[ \Pr_{\mathrm{Bin}} \left( D \left| p=p_{err} \right. \right) + \Pr_{\mathrm{Bin}} \left( D \left| p=1-p_{err} p=1-p_{\textrm{err}} \right. \right) \right]} \] where $D$ represents the observed read count data and $p_{err}$ is the probability of a sequencing error.