Estimating overdispersion parameters

In order to estimate the genome-wide overdispersion parameters \(\Phi_i\) and \(\Upsilon_i\), we use the same likelihood equations as in the CHT, but assume that there are no genetic effects. This means that for the read depth part of the test, \(\lambda_{hi}\), is equal to the expected counts, \(T^*_{ij}\), and for the allele-specific part of the test, \(p_h\) is equal to \(0.5\). Since the allele specific and read depth parts of the likelihood equation are independent, we can fit the overdispersion parameters separately.


To find the maximum likelihood estimate of \(\Omega_i\) we need to sum the log likelihood across all regions. This presents a problem, as \(\phi_j\) must also be estimated for each region. We therefore interatively estimate \(\phi_j\) by first finding a maximum likelihood estimate for \(\phi_j\) for each region using the equation:

\[\textrm{L}\left(\phi_j \left| D \right. \right) = \prod_i \left[ \Pr_{\mathrm{BNB}} \left( X = x_{ij} \left| \lambda = T^*_{ij}, \Omega_i, \phi_j \right. \right) \right]\]

and then finding a maximum likelihood estimate for \(\Omega_i\) for each individual using the equation:

\[\textrm{L}\left(\Omega_i \left| D \right. \right) = \prod_j \left[ \Pr_{\mathrm{BNB}} \left( X = x_{ij} \left| \lambda = T^*_{ij}, \Omega_i, \phi_j \right. \right) \right]\]

We repeat this iterative procedure until the improvement in likelihood becomes negligable.


We calculate the genome-wide likelihood of \(\Upsilon_i\) by taking the product of likelihoods from all target region SNPs that are heterozygous in individual \(i\). We again assume there is no genetic effect, so \(p\) = 0.5, and we use the following equation to find the maximum likelihood estimate of \(\Upsilon_i\):

\[\textrm{L}\left(\Upsilon_i \left| D \right. \right) = \prod_k \Pr_{\mathrm{BB-mix}} \left( Y = y_{ik} \left| n_{ik}, p = 0.5, \Upsilon_i \right. \right) \\\]