Hui Li renamed Methods Description.tex to GMM.tex  over 9 years ago

Commit id: 8a90890603f7fe9cfdcc415fad5550db420eaf94

deletions | additions      

         

\section{Methods Description} \label{sec:methods}  \subsection{GMM} \label{sec:methods-gmm}  \href{http://en.wikipedia.org/wiki/Expectation\%E2\%80\%93maximization_algorithm#mediaviewer/File:EM_Clustering_of_Old_Faithful_data.gif}{GMM (Gaussian mixture modeling)} (Wikipedia)  method maximizes the likelihood of the data set using EM (expectation-maximization) method.  $n = $ number of observations  $\theta = (\mu_{1},\mu_{2}, \sigma_{1}, \sigma_{2}, p)$  $\textbf{z}=(z_1,..., z_n)$ categorical vector, $z_i={1,2}$   $\textbf{x}=(x_1,...,x_n)$ vector of observations, where ($x_i|z_i=1) \sim N(\mu_{1}, \sigma^2_{1})$, ($x_i|z_i=2) \sim N(\mu_{2}, \sigma^2_{2})$  Expactation-Maximization method consists of expectation step (E-step) and maximization step (M-step).  \textit{E-step}  $P(z_1) = p, P(z_2) = 1-p $  Marginal likelihood: $L(\mathbf{\theta};\textbf{x};\textbf{z})$=$P(\textbf{x},\textbf{z}|\mathbf{\theta})$=$\prod\limits_{i=1}^n P(Z_i=z_i)f(x_i|\mu_{j}, \sigma^2_{j})$  $Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z}))$  $T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=\frac{P(z_{j})f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}$  $Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z})) = E[( \log \prod \limits_{i=1}^n L(\mathbf{\theta};x_{i};z_{i})] = \sum\limits_{i=1}^n E[( \log L(\mathbf{\theta};x_{i};z_{i})] = \sum\limits_{i=1}^n \sum\limits_{j=1}^2 T^{(t)}_{j,i}[\log P(z_{j}) -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log\sigma^{2}_{j} - \frac{(x_{i}-\mu_{j})^2}{2\sigma^{2}_{j}}]$  \textit{M-step}  $\theta^{(t+1)} = \arg \max Q(\theta|\theta^{(t)})$  $\hat{p}^{(t+1)} = \frac{1}{n} \sum\limits_{i=1}^n T^{(t)}_{1,i}$   $\mu^{(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}x_i}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$  $\sigma^{2(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$  1. Assume that $\textbf{x} \sim N(\mu, \sigma^2)$ (unimodal). Calculate $\mu$ and $\sigma^2$  2. Assume that $\textbf{x} \sim N(\mu_1, \mu_2, \sigma^2_1, \sigma^2_2, p)$ (bimodal)  Initial guess: $\mu_1 = \mu - \sigma, \mu_2 = \mu + \sigma, \sigma^2_1 = \sigma^2_2 = \sigma^2, p = 0.5$  Continue iterations t until $|\log L^{(t+1)} - \log L^{(t)}| < 10^{-3}$  Conclusion about data is made based on 3 parameters:  1. $-2 \ln \lambda = 2[\ln L_{bimodal} - \ln L_{unimodal}] \sim \chi^2$ (the bigger $-2 \ln \lambda$ is, the more we are convinced that it is a bimodal distribution)  2. $D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}$ ($D > 2$ is necessary for a bimodal distribution but not sufficient)  3. $kurtosis < 0$ is a necessary for a bimodal distribution but not sufficient