Graham McVicker edited Modeling read depths.tex  almost 10 years ago

Commit id: 969661b0416a72ac13dc067501b7612be6796e58

deletions | additions      

       

\subsection{Modeling the read depths}  The number of reads mapping to a target region is often modelled using a poisson distribution \cite{XXXXX}. However, the poisson assumption that the variance is equal to the mean is violated because read counts from target regions are overdispersed. Part of this overdispersion can be accommodated by modelling the data with a negative-binomial distribution with a variance parameter, $\eta_h$, for each test, $h$ \cite{Anders2010}. However, the negative binomial distribution assumes that the  mean and variance have a quadratic relationship that is consistent across individuals. We have found that this assumption is violated by sequencing data and causes poor calibration of the tests, particularly when sample sizes are small. The CHT therefore includes an additional overdispersion parameter for each individual, $\Phi_i$, which is fit across the genome. After adding this additional dispersion parameter, the data are modelled with a beta-negative-binomial (BNB) distribution. The expected number of counts, $\lambda_{hi}$, is calculated based on $\alpha_h$, $\beta_h$, and the genotype for individual $i$ at test SNP $m$. The estimate is scaled by the total number of mapped reads for individual $i$, $T_i$. \[  \lambda_i= \left\{