Authorea

Graham McVicker edited Modeling Allelic Imbalance.tex almost 10 years ago

Commit id: 89abdbcfc4b8db34e2f6a376f344fbc8bebda573

deletions | additions

\subsection{Modeling the allelic imbalances} Allele-specific read counts are sometimes modelled using the binomial distribution \cite{XXXX}, however, we have found that allele-specific read counts are overdispersed. We instead model allele-specific read counts with a beta-binomial (BB) distribution and estimate a parameter $\Upsilon_i$ that captures the overdispersion for each individual. The likelihood of the data is then given by: \[ \textrm{L}\left( D_{h\bullet} \left| \alpha_h, \beta_h, \Upsilon_\bullet \right. \right) = \prod_i \prod_k \Pr_{\mathrm{BB}} \left( Y = y_{ik} \left| n_{ik}, p_h, \Upsilon_i \right. \right) \\ \] where $y_{ik}$ are the number of allele-specific reads from the reference haplotype and $n_{ik}$ are the total number of allele-specific reads for individual $i$ at target SNP $k$. The fraction of allele-specific reads from the reference allele is $p_h = \frac{\alpha_h}{\alpha_h + \beta_h}$. The parameters $\alpha_h$ and $\beta_h$ are shared with the read depth componenent of the test, which is described above.