authorea.com/12691

Mode Test By GMM and Excess Mass Methods

\label{sec:methods}

\label{sec:methods-gmm}

GMM (Gaussian mixture modeling) method maximizes the likelihood of the data set using EM (expectation-maximization) method.

1. Assume that data has unimodal distribution: \(\textbf{x} \sim N(\mu, \sigma^2)\). Calculate \(\mu\) and \(\sigma^2\)

2. Assume that data has bimodal distribution: \(\textbf{x} \sim N(\mu_1, \mu_2, \sigma^2_1, \sigma^2_2, p)\)

Initial guess: \(\mu_1 = \mu - \sigma, \mu_2 = \mu + \sigma, \sigma^2_1 = \sigma^2_2 = \sigma^2, p = 0.5\)

\(n = \) number of observations

\(\theta = (\mu_{1},\mu_{2}, \sigma_{1}, \sigma_{2}, p)\) \(\textbf{z}=(z_1,..., z_n)\) categorical vector, \(z_i={1,2}\)

\(\textbf{x}=(x_1,...,x_n)\) observations, (\(x_i|z_i=1) \sim N(\mu_{1}, \sigma^2_{1})\), (\(x_i|z_i=2) \sim N(\mu_{2}, \sigma^2_{2})\)

*E-step* \(P(z_1) = p, P(z_2) = 1-p \)

Marginal likelihood: \(L(\mathbf{\theta};\textbf{x};\textbf{z})\)=\(P(\textbf{x},\textbf{z}|\mathbf{\theta})\)=\(\prod\limits_{i=1}^n P(Z_i=z_i)f(x_i|\mu_{j}, \sigma^2_{j})\)

\(Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z}))\)

\(T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=\frac{P(z_{j})f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}\)

\(Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z})) = \sum\limits_{i=1}^n E[( \log L(\mathbf{\theta};x_{i};z_{i})] =\)

\(= \sum\limits_{i=1}^n \sum\limits_{j=1}^2 T^{(t)}_{j,i}[\log P(z_{j}) -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log\sigma^{2}_{j} - \frac{(x_{i}-\mu_{j})^2}{2\sigma^{2}_{j}}]\)

*M-step* \(\theta^{(t+1)} = \arg \max Q(\theta|\theta^{(t)})\)

\(\hat{p}^{(t+1)} = \frac{1}{n} \sum\limits_{i=1}^n T^{(t)}_{1,i}\), \(\mu^{(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}x_i}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}\), \(\sigma^{2(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}\)

Continue iterations t until \(|\log L^{(t+1)} - \log L^{(t)}| < 10^{-3}\)

Conclusion about data is made based on 3 tests. \(H_{0}\) distribution is unimodal, \(H_{1}\) distribution is bimodal:

1. LRT (Likelihood ratio test) \(-2 \ln \lambda = 2[\ln L_{bimodal} - \ln L_{unimodal}] \sim \chi^2\) (LRT is the main test among all 3 tests for making conclusion about bimodality of data. The bigger \(-2 \ln \lambda\) is, the more we are convinced that distribution is bimodal).

2. (Bandwidth test) \(D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}\) (\(D (distance) > 2\) is necessary for a clear separation of 2 peaks).

3. (Kurtosis test) \(kurtosis < 0\) should be negative for a bimodal distribution.

In some hard cases \(D\) and \(kurtosis\) fail to detect bimodality. That is why our main test is LRT. For example on the next 2 plots distributions are bimodal, however on 1 plot D<2 (it is hard to distinguish 2 peaks) and on the 2 plot kurtosis is positive and that corresponds to unimodal distribution (it happens because distribution is biased):

## Share on Social Media