Authorea

Phil Marshall edited Challenge Structure.tex over 10 years ago

Commit id: 37812406e7b5c9502301a3746de0856fc30b3591

deletions | additions

\subsection{Instructions for participation, timeline, and ranking criteria} \label{ssec:instruction} The challenge is described at this website \url{http://darkenergysciencecollaboration.github.io/SLTimeDelayChallenge/}, where instructions Instructions for how to access the simulated light curves in the time delay challenge are given. given at this website \url{http://darkenergysciencecollaboration.github.io/SLTimeDelayChallenge/}. In short, participation in the challenge requires the following steps. \subsubsection{TDC0} Every prospective good team is invited to download the$N$ TDC0pairs of light curves and analyze them. Upon completion of the analysis, the they will submit their time delays estimates delay estimates, together with the their estimated 68\% uncertainties will be uploaded uncertainties, to a designated web site. the challenge organisers for analysis. The simulation team will calculate a minimum of four standard metrics given a this set of estimated time delays $\tilde{\Delta t}$ and uncertainties $\sigma$. The first one is efficiency, quantified as the fraction of light curves $f$ for which an estimate is obtained. Of course, this is not a sufficient requirement for success, as the estimate should also be accurate and have correct uncertainties. There might be case cases when the data are ambiguous (for example in case the time delay falls into season gaps) and for those some methods will indicate failure while others will estimate very large uncertainties. Therefore we need to introduce a second metric to evaluate how realistic is the error estimate. This is achieved with the second metric is metric: the goodness of fit of the estimates, quantified by the standard $\chi^2$ \begin{equation} \chi^2=\sum_i \left(\frac{\tilde{\Delta t}_i - \Delta t_i}{\sigma_i}\right)^2 t_i}{\sigma_i}\right)^2. \end{equation} The third metric is the precision of the estimator, quantified by the average relative uncertainties \begin{equation} P=\frac{1}{fN}\sum_i \left(\frac{\sigma_i}{|\Delta t_i|}\right) t_i|}\right). \end{equation} The fourth is the accuracy of the estimator, quantified by the average fractional residuals \begin{equation} A=\frac{1}{fN} \sum_i \left|\frac{\tilde{\Delta t}_i - \Delta t_i}{\Delta t_i}\right| t_i}\right|. \end{equation} A The final metric of our minimal set is given by the number of systems for which a cosmologically useful estimate is obtained. This fraction will depend not just on the algorithms but also on the actual time-delay and quality of the simulated data. The quantity $g$ is defined as the fraction of objects that satisfies the condition $sigma_i/\tilde{\Delta t}_i<0.05$. The initial function of these metrics is to define a minimal performance threshold that must be passed, in order to guarantee meaningful results in TDC1. To pass TDC0, an analysis team's results must satisfy the following criteria.

{\bf [EL: Why the lower bound on $\chi^2$? If Good Team fits extremely accurately, but puts an extra "systematic" error in to account for uncertainties, why penalize? This actually happens with our DRW fits where we sometimes get errors of 0.04 days but we never believe this accuracy and might inflate it to 0.4 days. This should be fine, especially seeing my note below about only counting in $f$ those systems with apparent precision within 5\%.]} [{\bf TT: I think that the lower bound on $\chi^2$ is needed because overestimating errors is not good either. If we we think errors are too large we might overlook some valuable system.}] A failure rate of 70\% is something like the borderline of acceptability for LSST, LSST (given the total number of lenses expected), and so can be used to define the efficiency threshold. The TDC0 lenses will be selected to span the range of possible time delays, rather than being sampled from the OM10 distribution, and so we therefore expect a higher rate of catastrophic failure at this stage than in TDC1: 30\% successes is a minimal bar to clear. {\bf [EL: see my previous remarks about not wanting $f=1$ but rather that $f$ should take the value of the fraction of systems that could legitimately be fit given season coverage. One should penalize $f$ greater than this value. Also, Alireza and I use ratings (gold, silver, brass) to indicate a degree of confidence; this is useful since systems will need spectroscopic followup and we shouldn't waste telescope time on brass systems. So a low $f$ is not automatically bad. One could allow Good Teams to submit one entry for their gold+silver systems, say, and one entry for all their systems, and not penalize the former due to low $f$ as long as $fN>100$ when $N\ge1000$, say, if that's what we think is realistic for followup.]} [{\bf TT: that's a good point and a matter of philosophy to some extent. In the scenario you describe one could imagine that failure means a very large uncertainty, so that your brass systems would have very large uncertainties and not be used. I am fine lowering the threshold considering that some systems might indeed not be measurable if there are too many gaps. So I lowered it to $f>0.3$}]. The factor of two half-ranges in reduced $\chi^2$ corresponds correspond approximately to fits that include approximately 95% of the $\chi^2$ probability distribution when $N=8$:such fits outside this range likely have problems with the time delay estimates, or the estimation of their uncertainties, or both. {\bf [EL: I didn't follow this. If fits are $2\sigma$ away then each contributes $\chi^2=4$ not 2.] TT: it's 2-$\sigma$ on the distribution of $\chi^2$ given $N=8$ degrees of freedom. I hope this version is clearer.} clearer. PJM: where does the N=8 come from? Aren't we interested in much larger Ndata than 8? And isn't Ndof = 0 for every lens, since the "data" is the estimated delay, and the parameter is the actual delay? I think we are doing something intuitively sensible here but maybe not explaining it well...} Requiring precision and accuracy of better than 15\% is a further minimal bar to clear; in \S~\ref{structure} we will describe the targets for TDC1. Repeat submissions will be accepted as teams iterate their analyses on the lower rungs of TDC0. The final rung will remain blinded until after the nominal deadline of 1 December 14 January 2013, when initial qualifiers for TDC1 will be announced and the TDC1 data released. Late submission will be accepted, but the teams will then have less time to carry out TDC1. \subsubsection{TDC1}