Authorea

Estimating overdispersion parameters

In order to estimate the genome-wide overdispersion parameters \(\Phi_i\) and \(\Upsilon_i\), we use the same likelihood equations as in the CHT, but assume that there are no genetic effects. This means that for the read depth part of the test, \(\lambda_{hi}\), is equal to the expected counts, \(T^*_{ij}\), and for the allele-specific part of the test, \(p_h\) is equal to \(0.5\). Since the allele specific and read depth parts of the likelihood equation are independent, we can fit the overdispersion parameters separately.

Beta-Negative-Binomial

To find the maximum likelihood estimate of \(\Omega_i\) we need to sum the log likelihood across all regions. This presents a problem, as \(\phi_j\) must also be estimated for each region. We therefore interatively estimate \(\phi_j\) by first finding a maximum likelihood estimate for \(\phi_j\) for each region using the equation:

\[\textrm{L}\left(\phi_j \left| D \right. \right) = \prod_i \left[ \Pr_{\mathrm{BNB}} \left( X = x_{ij} \left| \lambda = T^*_{ij}, \Omega_i, \phi_j \right. \right) \right]\]

and then finding a maximum likelihood estimate for \(\Omega_i\) for each individual using the equation:

\[\textrm{L}\left(\Omega_i \left| D \right. \right) = \prod_j \left[ \Pr_{\mathrm{BNB}} \left( X = x_{ij} \left| \lambda = T^*_{ij}, \Omega_i, \phi_j \right. \right) \right]\]

We repeat this iterative procedure until the improvement in likelihood becomes negligable.

Beta-Binomial

We calculate the genome-wide likelihood of \(\Upsilon_i\) by taking the product of likelihoods from all target region SNPs that are heterozygous in individual \(i\). We again assume there is no genetic effect, so \(p\) = 0.5, and we use the following equation to find the maximum likelihood estimate of \(\Upsilon_i\):

\[\textrm{L}\left(\Upsilon_i \left| D \right. \right) = \prod_k \Pr_{\mathrm{BB-mix}} \left( Y = y_{ik} \left| n_{ik}, p = 0.5, \Upsilon_i \right. \right) \\\]