Authorea

Tommaso Treu edited Challenge Structure.tex over 10 years ago

Commit id: ded3db76e8eca270e1472a4d0d6a1c3b8e5a22af

deletions | additions

\subsection{Steps of the Challenge} \label{ssec:steps} The initial challenge consists of two steps, hereafter time-delay challenge 0 and 1 (TDC0 and TDC1). Each time delay challenge is organized as a ladder with a number of simulated light curves at each rung. The rungs are intended to represent increasing levels of difficulty and realism within each challenge. The simulated light curves were created by the "evil team" (GD, CDF, PJM, TT). All the details about the light curves, including input parameters, noise properties etc, will be revealed to the teams participating in the challenge (hereafter "good teams") only after the closing of the challenge. One good team (Linder and Hojjati) beta-tested the first rung of TDC0 as a courtesy but they were not made aware of any of the input parameters of the simulated light curves except for the input time-delays. TDC0 consists of a small number of simulated light curves with fairly basic properties in terms of noise, sampling, cadence. It is intended to serve as a validation tool before embarking in TDC1. The evil team[{\bf CDF: Have we defined ``good'' and ``evil'' at this point??}] expects that state of the art algorithms should be able to process TDC0 with minimal computing time and recover the input time delays within the estimated uncertainties. TDC0 also provide a means to perform basic debugging and test input and output formats for the challenge. Good teams are required to successfully meet the TDC0 criteria before embarking in TDC1. The outcome of TDC0 will be a pass/fail response granting access to TDC1. TDC1 is the actual challenge. It consists of thousands of sets of simulated light curves, also arranged in rungs of increasing difficulty and realism. The large data volume is chosen to simulate the demands of an LSST like experiment, but also to be able to detect biases in the algorithms at the subpercent level. The evil team expects that processing the TDC1 dataset will be challenging with current algorithms in terms of computing resources. TDC1 thus represents a test of the accuracy of the algorithms but also of their efficiency. Incomplete submissions will be accepted, although the number of processed light curves is one of the metrics by which algorithms are evaluated, as described below. The details of the challenges are known only to the members of the evil team. One good team (Linder and Hojjati) beta-tested TDC0 as a courtesy but they were not made aware of any of the input parameters of the simulated light curves except for the input time-delays. As in other fields of astronomy (cite STEP, GREAT08, GREAT10 etc) the initial challenges TDC0 and TDC1 are relatively idealized. After the successful outcome of this first challenge we expect in the future to increase the complexity of the simulations so as to stimulate gradual improvements in the algorithms over the remainder of this decade. \subsection{Instructions for participation, timeline, and ranking criteria} \label{ssec:instruction}

\begin{enumerate} \item $f>0.5$ \item $0.5<\chi^2/fN<2$ \item $P<0.1$ $P<0.15$ \item $A<0.1$ $A<0.15$ \end{enumerate} A failure rate of 50\% is something like the borderline of acceptability for LSST, and so can be used to define the robustness threshold. The TDC0 lenses will be selected to span the range of possible time delays, rather than being sampled from the OM10 distribution, and so we therefore expect a higher rate of catastrophic failure at this stage than in TDC1: 50\% is a minimal bar to clear. The factor of two in reduced chi-squared corresponds approximately to fits that are two-sigma away from being acceptable when $N=8$: such fits likely have problems with the time delay estimates, or the estimation of their uncertainties, or both. Requiring precision and accuracy of better than 10\% 15\% is a further minimal bar to clear; in \S~\ref{structure} we will describe the targets for TDC1. Repeat submissions will be accepted as teams iterate their analyses on the lower rungs of TDC0. The final rung will remain blinded until after the nominal deadline of 1 November 2013, when initial qualifiers for TDC1 will be announced and the TDC1 data released.{\bf PJM: I think this makes sense, right? } Late submission will be accepted, but the teams will then have less time to carry out TDC1. \subsubsection{TDC1} Good teams that successfully pass TDC0 will given access to the full TDC1. As in TDC0 the good teams will estimate time delays and uncertainties and provide the answers to the evil team via a suitable web interface (TBC). (to be found at the challenge website). The evil team will compute the metrics described above, and above. There is no unique way to define a combined single summary metric that will balances all four different requirements. We choose to define one for the sake of completeness although this should be used to declare as a rough estimate of the overall winner quality of the challenge. The combined metric is defined as algorithms: \begin{equation} C=\left|\frac{\chi^2}{fN}-1\right|\frac{AP}{f}. \end{equation} Thesmallest $C$ wins. The results will not be revealed until the end of the challenge in order to maintain blindness. The deadline for TDC1 is 1st May 2014, i.e. six months after TDC0. Multiple submissions are accepted from each team in order to allow for correction of bugs, and for different algorithms. However, only the most recent submission for each algorithm will be considered in order to avoid favoring teams with multiple submissions. Late submissions will be accepted and included in the final publication if received in time, time but not eligible to will be declared winners of TDC1. flagged as such. \subsubsection{Publication of the results} Initially this first paper will only be posted on the arxiv as a means to open the challenge. After the deadline, the full details of the TDC0 and TDC1 will be revealed by adding an appendix to this paper. At the same time, the results of TDC1 will be described in the second paper of this series, including as co-authors all the members of the good teams who participated in the challenge. The two papers will be submitted concurrently so as to allow the referee to evaluate the entire process. \subsection{Overall goals and broad criteria for success} The overall goal of TDC0 and TDC1 is to carry out a blind test of current state of the art time-delay estimation algorithms in order to quantify the available accuracy. Criteria for success depend on the time-horizon. At present, time-delay cosmology is limited by the number of lenses with measured light curves and by the modeling uncertainties which are of order 5\% per system. Furthermore, distance measurements are currently in the range of accuracy of 3\%. Therefore, any method that can provide time-delays with realistic uncertainties ($\chi^2<1.5fN$) for the majority ($f>0.5$) of light curves with accuracy $A$ and precision $P$ better than 3\% can be considered a viable method. In the longer run, with LSST in mind, a desirable goal is to maintain $P<3\%$, but to improve the accuracy to $A < 0.2\%$ in order for the cosmological parameter estimates not to be systematics limited. limited by time-delay measurements systematics. For $N=1000$, the 2-sigma goodness of fit requirement becomes $\chi^2 < 1.09 fN$, while keeping $f>0.5$. Testing for such extreme accuracy requires a large sample of lenses: TDC1 will contain several thousand simulated systems to enable such tests.