Anomaly Detection in Massive Radio Interferometry Data Streams


The study of millisecond radio transients is important for a number of fundamental problems in astrophysics, including the characterization of the intergalactic medium, discovering exoplanets, and understanding the lifecycle of neutron stars. These transients are rare and unpredictable, requiring extensive blind surveys for a chance to detect a single event. However, even a single detection can have huge science payoffs, since they can help understand exotic states of matter or illuminate distant corners of the universe.

Recent technological advances in radio astronomy, particularly the use of large arrays of antennas known as interferometers, enable data collection at time resolutions sufficient to study these phenomena with exquisite sensitivity, resolution, and flexibility. This power comes with the cost of handling data streams of 1 TB hour\(^{-1}\), far faster than transportation and archiving infrastructure can support. Next generation radio telescopes will increase this data flow and requisite computing requirements by orders of magnitude. Evolutionary changes to data analysis will not save radio astronomers from this data deluge. A revolutionary approach is needed to do science with massive data streams. I am interested in developing the concepts of real-time anomaly detection and data triage as solutions to this big data challenge.

Image of a millisecond radio transient found in a blind survey with the VLA. I describe this observation in more detail in the article at \label{rratimg}


My concept for the study of radio transients and high data rate interferometry has been developed through an iterative approach over the past five years. The science, algorithms, and hardware we use today have evolved based on real-world experience.

I began this effort by leading the construction of the first instrument for millisecond imaging with an interferometer (Law et al. 2011, Astrophysical Journal, 742, 12). This instrument was installed at the Allen Telescope Array, a radio interferometer in northern California, where we used it to observe known millisecond radio transients. Standard radio astronomy software packages were not designed for millisecond timescale data, so I built a new data analysis system in Python.

With our first terabyte of data on disk, we began to think seriously about algorithms for efficiently searching for radio transients. The traditional data analysis systems required human interaction at every stage. This approach was not feasible when analyzing many millions of images. Our solution was a novel statistical test that automatically found transient candidates; the number of candidate events was small enough to be manually inspected by a person (Law et al. 2012, Astrophysical Journal, 749, 7). This algorithm is now being tested at new, powerful radio interferometers under construction around the world.

Based on that success, I began collaborating with the National Radio Astronomy Observatory to develop the world’s most powerful radio interferometer, the Very Large Array (VLA), for millisecond imaging. After a 3-month residency project, our team unveiled its first fruits: the first blind detection of a millisecond radio transient (See Figure \ref{rratimg}; Law et al. 2012, Astrophysical Journal, 760, 6). This transient was a rare kind of neutron star that pulses sporadically and has traditionally been studied by large, single-dish radio telescopes. By using an interferometer, we precisely localized the neutron star and could search for counterparts in optical surveys. The lack of an optical counterpart gave us insight into how the neutron star formed.

We have continued to develop the VLA for millisecond imaging and now routinely use it to observe at data rates of 300 MB s\(^{-1}\) or 1 TB hour\(^{-1}\). My software now incorporates new algorithms for high throughput radio transient searches and is run on compute clusters. I am leading a collaboration to search tens of terabytes of data using clusters at the VLA, at Los Alamos National Lab, and National Energy Research Scientific Computing Center (NERSC).