Authorea

Melanie edited Analysis Plan.tex about 8 years ago

Commit id: e8a31e8e4ad48756b6da8b958ea3118fb2d32d1d

deletions | additions

A consequence of the wide redshift range of the galaxies in GZH/GZH2 is that there is a non-uniform level of classification bias amongst the vote fractions. On average, the more distant galaxies have smaller, dimmer, and less-resolved cut-out images, making it more difficult to identify finer morphological features. In GZH this bias was corrected for the first question by quantifying the drop in the $p_{features}$ vote fraction as a function of surface brightness and redshift in a set of simulated images. The set was constructed using 288 SDSS galaxies which were artificially redshifted from $z=0.3$ to $z=1.0$ in increments of $\Delta z =0.1$. The simulated high redshift images, along with the original galaxy images, were classified in Galaxy Zoo using the same interface and decision tree as GZH. From these data, the change in the vote fraction $p_{features}$ as a function of redshift and surface brightness was measured, and from that a correction term was applied to the galaxies in the GZH sample. GZH2 will also use FERENGI images to correct for redshift bias, but the process will incorporate several improvements from the GZH method. First, the selection of SDSS galaxies to include in the FERENGI sample will be chosen to better overlap the HST galaxies in surface brightness and redshift distributions. Because corrections to the vote fractions were calculated in discrete bins of surface brightness and redshift, it was necessary to have a large number of FERENGI images in \emph{each} bin. In GZH, the distributions of surface brightness and redshift of the FERENGI images was offset from the space occupied by the HST galaxies (see Figure~\ref{fig:eyeofsauron}); for this reason corrections could not be applied to vote fractions for many HST galaxies, which occupied surface brightness-redshift spaces which did not overlap with the FERENGI data. Because of this limitation, 25\% of GZH could not be corrected for redshift bias. The new FERENGI sample will be selected to maximize the overlap of this space, to correct the maximum number of galaxies in GZH2. The FERENGI method for debiasing vote fractions was also limited in GZH due to the decision tree structure. As described above, the GZH question tree only required a minimum number of users to classify each galaxy; it did not require a minimum number of users to answer each question. This problem is doubly significant for the FERENGI galaxies, since vote fractions for the galaxy at its low redshift \emph{and} high redshift image must have statistical significance; in other words, for each data point analyzed in FERENGI, two images must have enough users answer a given question for both vote fractions to be significant. This strict requirement is not met by most pairs of galaxy images in FERENGI for higher-tier questions. For this reason, there are not enough FERENGI images in each surface brightness-redshift bin to measure a relationship between the vote fractions at high and low redshift, and therefore only vote fractions pertaining to the first question in GZH were able to be corrected for redshift bias. In GZH2, the FERENGI sample will be classified using the new decision tree method, which will ensure enough data is obtained for all questions in the tree, and as a result vote fractions for all morphological features will be corrected. \item in-browser fitting tool \end{enumerate} \item Description of final catalog, how to be used by public