Authorea

Heather Campbell edited sectionClassificatio.tex about 10 years ago

Commit id: d502dc51a434aa2ad7ab5daa282793a486a1cced

deletions | additions

\section{Classification} \label{class} Gaia is predicted to detect 44 million transits per day,which is $\sim$150 - 800 GByte/day of data. Within this huge volume of data we expect 100s -1000s of potential interesting astrophysical triggers per day (real variables/moving objects). This precludes visual classification of a rich data stream and thus automated methods which are fast, repeatable and tuneable are essential. The Gaia alerts classification pipeline uses random forest classification. The random forest will use all the information available, and its features will include: light curve photometry (gradient, amplitude, historic rms, magnitude, SNR, signal-to-noise ratio, transit rms), lowres spectra (flux v lambda, colours, SSCs, SpTy), spectral shape coefficients, spectral type), auxiliary information (neighbour star, shape pars, motion pars, coords, crowding, calibration offset, correlations, QC pars) and crossmatch environment (near known star mags, near knownstar cols, near known variable class, near galaxy, near galaxy redshift and circumnuclear). To build up a sufficient sample of classification labels in order to train the random forest classifier (e.g. \cite{Ofek_Cenko_Butler_et_al__2012}}) we aim to observe $\sim$500s homogenous high-quality spectra in the first year of the mission, spread across each broad class of transient phenomena (active galactic nuclei, core collapse SN, TDE, SN, Novae, CV and variable stars). The light curve classification utilises the flux gradient of the transient object. The Gaia observations with 106.5 mins cadence are used to indicate the type of object. The lowers (BP/RP) spectra provide far more information to aid classification \cite{Belokurov_in_prep_2014} and provide robust class for most objects, at $>$19mag, when the classifier is fully trained on representative data. In addition, the transient object will be cross matched with archival catalogues, for example, Sloan Digital Sky Survey (SDSS), Two Micron All Sky Survey (2MASS), HST and Visible and Infrared Survey Telescope for Astronomy (VISTA). This will help remove known variable star contaminates and provide environmental information for the transient events, e.g. is there a host galaxy associated with the source and if so what is the type and magnitude. To build up a sufficient sample of classification labels in order to train the random forest classifier (e.g. \citet{Ofek_Cenko_Butler_et_al__2012}) we aim to observe $\sim$500s homogenous high-quality spectra in the first year of the mission, spread across each broad class of transient phenomena (AGN/cnSN/TDE, SN/Novae, VarStar-CV, VarStar-Misc and VarStar-Periodic). The light curve classification utilises the flux gradient of the transient object. The Gaia observations with 106.5 mins cadence are used to indicate the type of object. The lowers (BP/RP) spectra provide far more information to aid classification \cite{Belokurov_in_prep_2014} and provide robust class for most objects, at $>$19mag, when the classifier is fully trained on representative data. In addition, the transient object will be cross matched with archival catalogues, for example, SDSS, 2MASS, HST and VISTA. This will help remove known variable star contaminates and provide environmental information for the transient events, e.g. is there a host galaxy associated with the source and if so what is the type and magnitude.