Real-Time Detection as Solution to Big Data Challenge

The technical requirements for our VLA FRB survey are extreme in astronomy. However, planned observatories like the SKA and LSST are finding more science at high data rates. Lessons learned from our project will have increasing relevance to astronomers working to solve the “needle in a haystack” problem.

Extreme data rate science is limited by requirements of data production, distribution, processing, and curation. Our solution (snearkernet, three compute clusters) is manageable for a limited (and compelling!) science case, but it is not sustainable in the long term. We believe a sustainable solution will use real-time transient detection . Bringing computational support closer telescopes ameliorates the distribution problem and let’s us ignore data we know is uninteresting, a technique known as “data triage”.

Figure \ref{realtime} summarizes the concept of data triage in transient detection. In some applications, the process of measuring all information about a transient candidate may substantially greater than simply detecting it 1. The difference between the two can be critical for extreme data rate applications. Once a transient candidate is detected, the data associated with the candidate can be saved for more detailed analysis. Data triage is routinely employed in the particle physics community, where a well-defined theory predicts the interactions of a particle with the detector. A detailed theory is critical to define what the absence of a detection means.


  1. In the case of the VLA FRB project, most of our images contain zero-mean Gaussian distributed pixel values, so one can imagine a number of simple statistical tests of Gaussianity to determine whether a candiate transient is present.