Wen Jenny Shi edited section_subsectionPr.tex  over 9 years ago

Commit id: ece32fa6ce2cc4d384bcee5a1cf6f817422eae6c

deletions | additions      

       

\section{} \section{Supplementary materials}  \subsection{Proof of binary splitting}  In this subsection we consider the simplest case. Suppose there are only two genome positions to be clustered, $Y=[Y_1,\; Y_2]$. If they share the same probability parameter, then the likelihood of the two share the same parameter is one when the numbers of observations at the two sites $m_1$ and $m_2$ are large.  

Stirling's formula provides the following approximation:  $$\log\Gamma(z)\approx\frac{1}{2}\log(2\pi)-\frac{1}{2}\log $\log\Gamma(z)\approx\frac{1}{2}\log(2\pi)-\frac{1}{2}\log  z+z\log z-z,$$ z-z,$  Therefore,  \begin{eqnarray*}  &&LR\\ 

Under null hypothesis that $Y_1$ and $Y_2$ follow the same distribution, i.e. they share the same probability parameter. Denote the comment probability parameter as $P=(p^1,\cdots,p^5)$. Then the normal approximation of the multinomial random variables are  $$y_i^j\approx $y_i^j\approx  m_ip^j+\sqrt{m_i}z_i^j+ \Op(\sqrt{m_i}), \;\textit{for }i=1,2;\;j=1,\cdots,5,$$ }i=1,2;\;j=1,\cdots,5,$  where $z_i^j$'s are standard normal random variables and $\sumj z_i^j=0$ for $i=1,2$. 

Note that, in general, by L'Hopital's rule, as $m_i\rightarrow \infty$,  $$\sqrt{m_i}\log\left(1+\frac{\sqrt{m_i}z_i^j+1/25}{m_ip^j} $\sqrt{m_i}\log\left(1+\frac{\sqrt{m_i}z_i^j+1/25}{m_ip^j}  \right)=\frac{\log\left(1+\frac{\sqrt{m_i}z_i^j+1/25}{m_ip^j} \right)}{1/\sqrt{m_i}}\longrightarrow\frac{z_i^j}{p^j}, \;\textit{for }i=1,2;\;j=1,\cdots,5.$$ }i=1,2;\;j=1,\cdots,5.$  Under the assumption that $m_1$ and $m_2$ are increasing at the same rate, let $m_1=m$ and $m_2=cm$, for some $c>0.$ Then as $m\rightarrow \infty$,