Authorea

Graham McVicker edited Modeling Allelic Imbalance.tex almost 10 years ago

Commit id: 5e7ac1c630cfb65c1c8c44c47bb02bbc4a9ad702

deletions | additions

Allele-specific read counts are sometimes modelled using the binomial distribution \cite{Reddy_2012}, however, we have found that allele-specific read counts are overdispersed. We instead model allele-specific read counts with a beta-binomial (BB) distribution and estimate a parameter $\Upsilon_i$ that captures the overdispersion for each individual. The likelihood of the data for test $h$, $D_h$, is then given by: \[ \textrm{L}\left( D_{h\bullet} D_{h} \left| \alpha_h, \beta_h, \Upsilon_\bullet \right. \right) = \prod_i \prod_k \Pr_{\mathrm{BB}} \left( Y = y_{ik} \left| n_{ik}, p_h, \Upsilon_i \right. \right) \\ \] where $y_{ik}$ is the number of allele-specific reads from the reference haplotype and $n_{ik}$ is the total number of allele-specific reads for individual $i$ at target SNP $k$. The expected fraction of allele-specific reads from the reference allele is $p_h = \frac{\alpha_h}{\alpha_h + \beta_h}$.