this is for holding javascript data
Graham McVicker edited Modeling read depths.tex
almost 10 years ago
Commit id: 163e41be92958453f0cf747df4714772a594eebb
deletions | additions
diff --git a/Modeling read depths.tex b/Modeling read depths.tex
index 4523b62..cce321b 100644
--- a/Modeling read depths.tex
+++ b/Modeling read depths.tex
...
\subsection{Modeling the read depths}
The number of reads mapping to a target region is often modelled using a poisson distribution \cite{Marioni_2008}. However, the poisson assumption that the variance is equal to the mean is violated because read counts from target regions are overdispersed. Part of this overdispersion can be accommodated by modelling the data with a negative-binomial distribution with a variance parameter, $\eta_h$, for each test, $h$ \cite{Anders_2010}. However, the negative binomial distribution assumes that the mean and variance have a quadratic relationship that is consistent across individuals. We have found that this assumption is violated by sequencing data and causes poor calibration of the tests, particularly when sample sizes are small. The CHT therefore includes an additional overdispersion parameter for each individual, $\Phi_i$, which is fit across the genome. After adding this additional dispersion parameter, the data are modelled with a beta-negative-binomial (BNB) distribution. The expected number of counts, $\lambda_{hi}$, is calculated based on $\alpha_h$, $\beta_h$, and the
genotype genotype, $G_{im}$ for individual $i$ at test SNP $m$. The estimate is scaled by the total number of mapped reads for individual $i$, $T_i$.
\[
\lambda_i= \lambda_{hi} = \left\{
\begin{array}{ll}
2 \alpha T_i & \textrm{if }
G_i G_{im} = 0 \textrm{ (homozygous allele 1)} \\
\\
\left( \alpha + \beta \right) T_i & \textrm{if }
G_i G_{im} = 1 \textrm{ (heterozygous)} \\
\\
2 \beta T_i & \textrm{if }
G_i G_{im} = 2 \textrm{ (homozygous allele 2)}
\end{array} \right.
\]
The likelihood of the
parameters data is then given by
the equation
\[
\textrm{L}\left(
D_h \left| \alpha_h, \beta_h,
\Phi_\bullet, \eta_j
\left| D \right. \right) = \prod_i \Pr_{\mathrm{BNB}} \left( X = x_{ij} \left| \lambda_{hi}, \Phi_i, \eta_j \right. \right) \\
\]
where $x_{ij}$ is the number of reads for individual $i$ in target region $j$.
We detail estimation of the genomewide dispersion parameter for each individual, $\Phi_i$, below.