# Methods Description

\label{sec:methods}

## GMM

\label{sec:methods-gmm}

GMM (Gaussian mixture modeling) method maximizes the likelihood of the data set using EM (expectation-maximization) method.

1. Assume that data has unimodal distribution: $$\textbf{x} \sim N(\mu, \sigma^2)$$. Calculate $$\mu$$ and $$\sigma^2$$

2. Assume that data has bimodal distribution: $$\textbf{x} \sim N(\mu_1, \mu_2, \sigma^2_1, \sigma^2_2, p)$$

Initial guess: $$\mu_1 = \mu - \sigma, \mu_2 = \mu + \sigma, \sigma^2_1 = \sigma^2_2 = \sigma^2, p = 0.5$$

$$n =$$ number of observations

$$\theta = (\mu_{1},\mu_{2}, \sigma_{1}, \sigma_{2}, p)$$ $$\textbf{z}=(z_1,..., z_n)$$ categorical vector, $$z_i={1,2}$$

$$\textbf{x}=(x_1,...,x_n)$$ observations, ($$x_i|z_i=1) \sim N(\mu_{1}, \sigma^2_{1})$$, ($$x_i|z_i=2) \sim N(\mu_{2}, \sigma^2_{2})$$

E-step $$P(z_1) = p, P(z_2) = 1-p$$

Marginal likelihood: $$L(\mathbf{\theta};\textbf{x};\textbf{z})$$=$$P(\textbf{x},\textbf{z}|\mathbf{\theta})$$=$$\prod\limits_{i=1}^n P(Z_i=z_i)f(x_i|\mu_{j}, \sigma^2_{j})$$

$$Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z}))$$

$$T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=\frac{P(z_{j})f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}$$

$$Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z})) = \sum\limits_{i=1}^n E[( \log L(\mathbf{\theta};x_{i};z_{i})] =$$

$$= \sum\limits_{i=1}^n \sum\limits_{j=1}^2 T^{(t)}_{j,i}[\log P(z_{j}) -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log\sigma^{2}_{j} - \frac{(x_{i}-\mu_{j})^2}{2\sigma^{2}_{j}}]$$

M-step $$\theta^{(t+1)} = \arg \max Q(\theta|\theta^{(t)})$$

$$\hat{p}^{(t+1)} = \frac{1}{n} \sum\limits_{i=1}^n T^{(t)}_{1,i}$$, $$\mu^{(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}x_i}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$$, $$\sigma^{2(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}$$

Continue iterations t until $$|\log L^{(t+1)} - \log L^{(t)}| < 10^{-3}$$

Conclusion about data is made based on 3 tests. $$H_{0}$$ distribution is unimodal, $$H_{1}$$ distribution is bimodal:

1. LRT (Likelihood ratio test) $$-2 \ln \lambda = 2[\ln L_{bimodal} - \ln L_{unimodal}] \sim \chi^2$$ (LRT is the main test among all 3 tests for making conclusion about bimodality of data. The bigger $$-2 \ln \lambda$$ is, the more we are convinced that distribution is bimodal).

2. (Bandwidth test) $$D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}$$ ($$D (distance) > 2$$ is necessary for a clear separation of 2 peaks).

3. (Kurtosis test) $$kurtosis < 0$$ should be negative for a bimodal distribution.

In some hard cases $$D$$ and $$kurtosis$$ fail to detect bimodality. That is why our main test is LRT. For example on the next 2 plots distributions are bimodal, however on 1 plot D<2 (it is hard to distinguish 2 peaks) and on the 2 plot kurtosis is positive and that corresponds to unimodal distribution (it happens because distribution is biased):