Authorea

Eric Linder edited Challenge Structure.tex almost 11 years ago

Commit id: 3b9b0952bb7a4aa3f45b989f107e5b70482cf0a4

deletions | additions

As in other fields of astronomy (cite STEP, GREAT08, GREAT10 etc) the initial challenges TDC0 and TDC1 are relatively idealized. After the successful outcome of this first challenge we expect in the future to increase the complexity of the simulations so as to stimulate gradual improvements in the algorithms over the remainder of this decade. {\bf [EL: Yet some teams have, and other others could, test on real data. This leads to consistency, not accuracy, and is not blind, but should we mention this?]} \subsection{Instructions for participation, timeline, and ranking criteria} \label{ssec:instruction}

\item $0.5<\chi^2/fN<2$ \item $P<0.15$ \item $A<0.15$ \end{enumerate} {\bf [EL: Why the lower bound on $\chi^2$? If Good Team fits extremely accurately, but puts an extra "systematic" error in to account for uncertainties, why penalize? This actually happens with our DRW fits where we sometimes get errors of 0.04 days but we never believe this accuracy and might inflate it to 0.4 days. This should be fine, especially seeing my note below about only counting in $f$ those systems with apparent precision within 5%.]} 5\%.]} A failure rate of 50\% is something like the borderline of acceptability for LSST, and so can be used to define the robustness threshold. The TDC0 lenses will be selected to span the range of possible time delays, rather than being sampled from the OM10 distribution, and so we therefore expect a higher rate of catastrophic failure at this stage than in TDC1: 50\% is a minimal bar to clear. {\bf [EL: see my previous remarks about not wanting $f=1$ but rather that $f$ should take the value of the fraction of systems that could legitimately be fit given season coverage. One should penalize $f$ greater than this value. Also, Alireza and I use ratings (gold, silver, brass) to indicate a degree of confidence; this is useful since systems will need spectroscopic followup and we shouldn't waste telescope time on brass systems. So a low $f$ is not automatically bad. One could allow Good Teams to submit one entry for their gold+silver systems, say, and one entry for all their systems, and not penalize the former due to low $f$ as long as $fN>100$ when $N\ge1000$, say, if that's what we think is realistic for followup.]} The factor of two in reduced chi-squared corresponds approximately to fits that are two-sigma away from being acceptable when $N=8$: such fits likely have problems with the time delay estimates, or the estimation of their uncertainties, or both. {\bf [EL: I didn't follow this. If fits are $2\sigma$ away then each contributes $\chi^2=4$ not 2.]} Requiring precision and accuracy of better than 15\% is a further minimal bar to clear; in \S~\ref{structure} we will describe the targets for TDC1. {\bf [EL: We actually care much more about the "apparently precise" systems than about all the systems. For time delays of 1-5 days, it will be almost impossible with LSST cadence to get 5% 5\% precision. The cosmological leverage will then all come from long time delays of 30-100 days. So maybe we should specifically redefine $f$ as the fraction of systems fit to apparent precision $sigma_i/\tilde{\Delta t_i}<0.05$ t}_i<0.05$ (note the Good Team measures both numerator and denominator so it stays blind). In this case $f$ will generally be much less than 1, but should roughly represent the fraction of systems with time delays between 30 days and 120 days (or the season length). If you want, you could put in some trick systems where the time delay takes it to the next season, i.e.\ greater than 240 days.]} Repeat submissions will be accepted as teams iterate their analyses on the lower rungs of TDC0. The final rung will remain blinded until after the nominal deadline of 1 November 2013, when initial qualifiers for TDC1 will be announced and the TDC1 data released. Late submission will be accepted, but the teams will then have less time to carry out TDC1.

\begin{equation} C=\left|\frac{\chi^2}{fN}-1\right|\frac{AP}{f}. \end{equation} {\bf [EL: Not sure where this combination comes from. A standard statistical measure is the mean squared error or risk: $R=\sum_i\sqrt{\sigma_i^2+(\tilde{\Delta t_i}-\Delta t}_i-\Delta t_i)^2}=\sum_i \sigma_i\sqrt{1+\chi^2_i}$. Again we can could apply this only for the "apparently precise" fraction $f$.]} The results will not be revealed until the end of the challenge in order to maintain blindness. The deadline for TDC1 is 1st May 2014, i.e. six months after TDC0. Multiple submissions are accepted from each team in order to allow for correction of bugs, and for different algorithms. However, only the most recent submission for each algorithm will be considered in order to avoid favoring teams with multiple submissions. Late submissions will be accepted and included in the final publication if received in time but will be flagged as such.