Authorea

Yingyi edited bootstrap.tex over 9 years ago

Commit id: 6cf9c28b38b7f636d5ca0f71b49387694bb09e77

deletions | additions

\subsection{Bootstrap} \label{sec:methods-bootstrap} The GMM introduced in Section~\ref{methods-gmm} Section~\ref{sec:methods-gmm} can constrain the best-fit parameters for the given modes, but the best-fit parameters give no hints to the goodness of the bimodal distribution against the unimodal one explicitly. So we need to use the bootstrap method (Efron 1979) to test the hypothesis of the bimodal distribution. The basic idea of this method is to generate a simulated sample from the original data and re-do the estimation. Then some statistics can be determined, e.g. the probability of certain modes and the errors of the parameters, by comparing the re-estimated results with the original ones. In this project, we imply two kinds of bootstraps, i.e. the parametric bootstrap and non-parametric bootstrap methods. We use the parametric bootstrap to estimate the probability of the unimodal distribution, i.e. the $p$-value of $-2 \ln \lambda$, $D$ and $kurtosis$. In this case, the test sample is generated from the unimodal Gaussian distribution $N(\mu, \sigma^2)$ fitted from the original data by GMM. By keeping the same data size, we repeat the bootstrap for a large number of times (1000 as default, but 100 when the data size is larger than 500). Then we count the number of repeats that \begin{eqnarray}

\end{eqnarray} respectively. Then the $p$-values are given by these counts divided by the total repeating times. We use the non-parametric bootstrap to estimate the errors of the parameters. In this case, the test sample is generated from the original data allowing repetitions. The data size is still unchanged and 100 re-estimations are achieved. Then the errors for all the parameters as defined in Section~\ref{methods-gmm} Section~\ref{sec:methods-gmm} can be determined by calculating the standard deviation of the 100 re-estimated results.