this is for holding javascript data
Hui Li renamed Methods Description.tex to GMM.tex
over 9 years ago
Commit id: 8a90890603f7fe9cfdcc415fad5550db420eaf94
deletions | additions
diff --git a/GMM.tex b/GMM.tex
new file mode 100644
index 0000000..a8a8ae9
--- /dev/null
+++ b/GMM.tex
...
\section{Methods Description} \label{sec:methods}
\subsection{GMM} \label{sec:methods-gmm}
\href{http://en.wikipedia.org/wiki/Expectation\%E2\%80\%93maximization_algorithm#mediaviewer/File:EM_Clustering_of_Old_Faithful_data.gif}{GMM (Gaussian mixture modeling)} (Wikipedia)
method maximizes the likelihood of the data set using EM (expectation-maximization) method.
$n = $ number of observations
$\theta = (\mu_{1},\mu_{2}, \sigma_{1}, \sigma_{2}, p)$
$\textbf{z}=(z_1,..., z_n)$ categorical vector, $z_i={1,2}$
$\textbf{x}=(x_1,...,x_n)$ vector of observations, where ($x_i|z_i=1) \sim N(\mu_{1}, \sigma^2_{1})$, ($x_i|z_i=2) \sim N(\mu_{2}, \sigma^2_{2})$
Expactation-Maximization method consists of expectation step (E-step) and maximization step (M-step).
\textit{E-step}
$P(z_1) = p, P(z_2) = 1-p $
Marginal likelihood: $L(\mathbf{\theta};\textbf{x};\textbf{z})$=$P(\textbf{x},\textbf{z}|\mathbf{\theta})$=$\prod\limits_{i=1}^n P(Z_i=z_i)f(x_i|\mu_{j}, \sigma^2_{j})$
$Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z}))$
$T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=\frac{P(z_{j})f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}$
$Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z})) = E[( \log \prod \limits_{i=1}^n L(\mathbf{\theta};x_{i};z_{i})] = \sum\limits_{i=1}^n E[( \log L(\mathbf{\theta};x_{i};z_{i})] = \sum\limits_{i=1}^n \sum\limits_{j=1}^2 T^{(t)}_{j,i}[\log P(z_{j}) -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log\sigma^{2}_{j} - \frac{(x_{i}-\mu_{j})^2}{2\sigma^{2}_{j}}]$
\textit{M-step}
$\theta^{(t+1)} = \arg \max Q(\theta|\theta^{(t)})$
$\hat{p}^{(t+1)} = \frac{1}{n} \sum\limits_{i=1}^n T^{(t)}_{1,i}$
$\mu^{(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}x_i}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$
$\sigma^{2(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$
1. Assume that $\textbf{x} \sim N(\mu, \sigma^2)$ (unimodal). Calculate $\mu$ and $\sigma^2$
2. Assume that $\textbf{x} \sim N(\mu_1, \mu_2, \sigma^2_1, \sigma^2_2, p)$ (bimodal)
Initial guess: $\mu_1 = \mu - \sigma, \mu_2 = \mu + \sigma, \sigma^2_1 = \sigma^2_2 = \sigma^2, p = 0.5$
Continue iterations t until $|\log L^{(t+1)} - \log L^{(t)}| < 10^{-3}$
Conclusion about data is made based on 3 parameters:
1. $-2 \ln \lambda = 2[\ln L_{bimodal} - \ln L_{unimodal}] \sim \chi^2$ (the bigger $-2 \ln \lambda$ is, the more we are convinced that it is a bimodal distribution)
2. $D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}$ ($D > 2$ is necessary for a bimodal distribution but not sufficient)
3. $kurtosis < 0$ is a necessary for a bimodal distribution but not sufficient