Authorea

Olga edited Background.tex over 9 years ago

Commit id: 9152220a86b41ea509a5f1e7943e4b7aa4d933e8

deletions | additions

\section{Introduction} \label{sec:intro} In statistics, a mixture model is a probabilistic model to represent an overall population by the presence of several subpopulations. For an observed data set, we often give a mixture distribution with several modes (e.g. a bimodal distribution) in stead instead of identifying the subpopulation to which an individual observation belongs. Although there are algorithms (e.g. Expectation maximization) that can well fit the observation data given a mixture model (e.g. Gaussian mixture model), the goodness of the mixture model is still needed to evaluate in the first place. Our project is to test a bimodal distribution from a unimodal distribution using two different methods, i.e. the Gaussian mixture method (GMM) and the Excess-mass method (EMM). This so-called ``mode test'' is an important start point for serious scientific study. Astronomers prefer to classify objects, such as galaxies, into groups first, and then study them by comparing and contrasting certain properties among different groups. For example, research shows that the color distribution of galaxies may follow a bimodal distribution, i.e. galaxies could be divided into two populations (red and blue) according to their observational color. The color of a galaxy indicates the information about its star formation. A typical assumption about this color classification is that the red galaxies associate with old, rounded population of galaxies with little star formation, while the blue ones are most young galaxies with ongoing star information. Apparently, a mode test is needed to confirm this assumption and to make the following study based on this classification meaningful.