METHODS DESCRIPTION GMM GMM (Gaussian mixture modeling) method maximizes the likelihood of the data set using EM (expectation-maximization) method. 1. Assume that data has unimodal distribution: X ∼ N(μ, σ²). Calculate μ and σ² 2. Assume that data has bimodal distribution: X ∼ N(μ₁, μ₂, σ₁², σ₂², p) Initial guess: μ₁ = μ − σ, μ₂ = μ + σ, σ₁² = σ₂² = σ², p = 0.5 n= number of observations θ = (μ₁, μ₂, σ₁, σ₂, p) Z = (z₁, ..., zn) categorical vector, zi = 1, 2 X = (x₁, ..., xn) observations, (xi|zi = 1)∼N(μ₁, σ₁²), (xi|zi = 2)∼N(μ₂, σ₂²) _E-step_ P(z₁)=p, P(z₂)=1 − p Marginal likelihood: L(Θ; X; Z)=P(X, Z|Θ)=$\prod^n P(Z_i=z_i)f(x_i|, \sigma^2_{j})$ Q(Θ|Θ(T))=EZ|X, Θ(T)(logL(Θ; X; Z)) $T^{(t)}_{j,i}=P(Z_i=j|X_i=x_i,\theta^{(t)})=)f(x_i|\mu^{(t)}_{j}, \sigma^{2(t)}_{j})}{p^{(t)} f(x_i|\mu^{(t)}_{1}, \sigma^{2(t)}_{1})+(1-p^{(t)})f(x_i|\mu^{(t)}_{2}, \sigma^{2(t)}_{2})}$ $Q(|})=E_{|,}}(\log L(;;)) = \sum^n E[( \log L(;x_{i};z_{i})] =$ $= \sum^n \sum^2 T^{(t)}_{j,i}[\log P(z_{j}) -{2}\log(2\pi) - {2}\log\sigma^{2}_{j} - -)^2}{2\sigma^{2}_{j}}]$ _M-step_ θ(t + 1) = argmaxQ(θ|θ(t)) $^{(t+1)} = {n} \sum^n T^{(t)}_{1,i}$, $\mu^{(t+1)}_{1} = ^n T^{(t)}_{1,i}x_i}{\sum^n T^{(t)}_{1,i}}$, $\sigma^{2(t+1)}_{1} = ^n T^{(t)}_{1,i}(x_i-\mu^{(t+1)}_{1})^2}{\sum^n T^{(t)}_{1,i}}$ Continue iterations t until |logL(t + 1) − logL(t)|<10−3 Conclusion about data is made based on 3 tests. H₀ distribution is unimodal, H₁ distribution is bimodal: 1. LRT (Likelihood ratio test) −2lnλ = 2[lnLbimodal − lnLunimodal]∼χ² (LRT is the main test among all 3 tests for making conclusion about bimodality of data. The bigger −2lnλ is, the more we are convinced that distribution is bimodal). 2. (Bandwidth test) $D = \frac {|\mu_1 - \mu_2|}{(\sigma^2_1+\sigma^2_2)/2)^{0.5}}$ (D(distance)>2 is necessary for a clear separation of 2 peaks). 3. (Kurtosis test) kurtosis < 0 should be negative for a bimodal distribution. In some hard cases D and kurtosis fail to detect bimodality. That is why our main test is LRT. For example on the next 2 plots distributions are bimodal, however on 1 plot D<2 (it is hard to distinguish 2 peaks) and on the 2 plot kurtosis is positive and that corresponds to unimodal distribution (it happens because distribution is biased):