ROUGH DRAFT authorea.com/12691
Main Data History
Export
Show Index Toggle 8 comments
  •  Quick Edit
  • Mode Test By GMM and Excess Mass Methods

    Methods Description

    \label{sec:methods}

    GMM

    \label{sec:methods-gmm}

    GMM (Gaussian mixture modeling) method maximizes the likelihood of the data set using EM (expectation-maximization) method.

    1. Assume that data has unimodal distribution: \(\textbf{x} \sim N(\mu, \sigma^2)\). Calculate \(\mu\) and \(\sigma^2\)

    2. Assume that data has bimodal distribution: \(\textbf{x} \sim N(\mu_1, \mu_2, \sigma^2_1, \sigma^2_2, p)\)

    Initial guess: \(\mu_1 = \mu - \sigma, \mu_2 = \mu + \sigma, \sigma^2_1 = \sigma^2_2 = \sigma^2, p = 0.5\)

    \(n = \) number of observations

    \(\theta = (\mu_{1},\mu_{2}, \sigma_{1}, \sigma_{2}, p)\) \(\textbf{z}=(z_1,..., z_n)\) categorical vector, \(z_i={1,2}\)

    \(\textbf{x}=(x_1,...,x_n)\) observations, (\(x_i|z_i=1) \sim N(\mu_{1}, \sigma^2_{1})\), (\(x_i|z_i=2) \sim N(\mu_{2}, \sigma^2_{2})\)

    E-step \(P(z_1) = p, P(z_2) = 1-p \)

    Marginal likelihood: \(L(\mathbf{\theta};\textbf{x};\textbf{z})\)=\(P(\textbf{x},\textbf{z}|\mathbf{\theta})\)=\(\prod\limits_{i=1}^n P(Z_i=z_i)f(x_i|\mu_{j}, \sigma^2_{j})\)

    \(Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z}))\)

    \(T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=\frac{P(z_{j})f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}\)

    \(Q(\mathbf{\theta}|\mathbf{\theta^{(t)}})=E_{\textbf{z}|\textbf{x},\mathbf{\theta^{(t)}}}(\log L(\mathbf{\theta};\textbf{x};\textbf{z})) = \sum\limits_{i=1}^n E[( \log L(\mathbf{\theta};x_{i};z_{i})] =\)

    \(= \sum\limits_{i=1}^n \sum\limits_{j=1}^2 T^{(t)}_{j,i}[\log P(z_{j}) -\frac{1}{2}\log(2\pi) - \frac{1}{2}\log\sigma^{2}_{j} - \frac{(x_{i}-\mu_{j})^2}{2\sigma^{2}_{j}}]\)

    M-step \(\theta^{(t+1)} = \arg \max Q(\theta|\theta^{(t)})\)

    \(\hat{p}^{(t+1)} = \frac{1}{n} \sum\limits_{i=1}^n T^{(t)}_{1,i}\), \(\mu^{(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}x_i}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}\), \(\sigma^{2(t+1)}_{1} = \frac{\sum\limits_{i=1}^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum\limits_{i=1}^n T^{(t)}_{1,i}}\)

    Continue iterations t until \(|\log L^{(t+1)} - \log L^{(t)}| < 10^{-3}\)

    Conclusion about data is made based on 3 tests. \(H_{0}\) distribution is unimodal, \(H_{1}\) distribution is bimodal:

    1. LRT (Likelihood ratio test) \(-2 \ln \lambda = 2[\ln L_{bimodal} - \ln L_{unimodal}] \sim \chi^2\) (LRT is the main test among all 3 tests for making conclusion about bimodality of data. The bigger \(-2 \ln \lambda\) is, the more we are convinced that distribution is bimodal).

    2. (Bandwidth test) \(D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}\) (\(D (distance) > 2\) is necessary for a clear separation of 2 peaks).

    3. (Kurtosis test) \(kurtosis < 0\) should be negative for a bimodal distribution.

    In some hard cases \(D\) and \(kurtosis\) fail to detect bimodality. That is why our main test is LRT. For example on the next 2 plots distributions are bimodal, however on 1 plot D<2 (it is hard to distinguish 2 peaks) and on the 2 plot kurtosis is positive and that corresponds to unimodal distribution (it happens because distribution is biased):

    \(\mu_1 = -0.9, \mu_2 = 1, \sigma^2_1 = \sigma^2_2 = 1, p = 0.4\)

    D = 1.9 (<2), kurtosis = -0.395 (<0)