Casey Law edited untitled.tex  about 10 years ago

Commit id: 5ef645f44e0f38dda06ae8e2e665535948d74c95

deletions | additions      

       

The localization precision of a radio interferometer requires processing a torrential data stream. My work with the Very Large Array (VLA) has commissioned an observing mode that produces data rates of 1 TB hour$^{-1}$. I have written an extensive parallelized software system to search VLA data for transients. My collaborators and I have observed for 100 hours to produce 100 TB of data in the search for fast radio transients of various types. This new observing mode is pushing the VLA beyond its intended use and finding new, compelling science at those limits.  The challenges of this effort are increasingly common in the sciences. My current efforts are focused on improving the parallelization and robustness of my radio transient search, issues shared with many in high-performance computing. search.  More broadly, I am interested in developing the concept of \emph{real time anomaly detection} for massive data streams. In the study of radio transients, real-time detection would allow us to throttle the data stream by saving data only for the brief moments of interest. This process of "data triage" will be a key strategy to extracting science in data-intensive fields. \section{Science}  An exciting new class of radio transients is the "fast radio burst" (FRB; Thornton et al. 2013, Science, 341, 53). Discovered in all-sky pulsar surveys by single-dish telescopes, their dispersion is an order of magnitude larger than expected from the Galaxy and consistent with propagation through the intergalactic medium. If FRBs lie at cosmological distances, their dispersion can be used to measure the baryonic mass properties  of the IGM. intergalactic medium.  Beyond using FRBs as probes, understanding the origin of FRBs may have relevance to gamma-ray bursts and sources of gravitational waves. The most distant pulsar known was recently detected in Andromeda (Rubio-Hererra et al. 2013, MNRAS, 428, 2857). Dispersion of a sample of such transients will directly measure the baryons in the outer fringes (the "halo") of the Milky Way and M31. Roughly 50\% of baryons in the local universe have not been directly detected and fast radio transients may help solve this "missing baryon problem".  Nearer to our own Galaxy, pulsar surveys have discovered the "rotating radio transient" (RRAT; McLaughlin et al. 2006, Nature, 439, 817), a spinning neutron star that sporadically pulses. While a few dozen RRATs are now known, it is unclear whether they are tied to extreme objects like magnetars or simply ordinary pulsars that emit bright pulses detectable individually.The first pulsar was recently detected in Andromeda (Rubio-Hererra et al. 2013, MNRAS, 428, 2857). The dispersion of radio transients is highly sensitive to baryons in the outer fringes (the "halo") of the Milky Way and M31. Roughly 50\% of baryons in the local universe have not been directly detected and fast radio transients may help solve this "missing baryon problem".  Much closer to earth, we know that Jupiter emits intense radio bursts that make it the brightest astronomical object at low radio frequencies. Coronal mass ejections (much as seen in the Sun), also drive radio fast, coherent radio flares. These processes could be used to measure magnetism and plasma properties of other stars and should profoundly affect the habitability of orbiting exoplanets. Both of these mechanisms should be detectable as subsecond transients.   \section{Real-Time Detection as Solution to the  Big Data Challenge} The technical requirements for our radio transient searchs are extreme in astronomy, but are becoming more common (e.g., see plans for the SKA and LSST). Lessons learned from our project will have increasing relevance to scientists working to solve the "needle in a haystack" problem.  Currently, we are recording data to disk at a rate of 1 TB hour$^{-1}$ and processing it on compute clusters near the VLA, at Los Alamos National Lab, and NERSC. The internet is too slow to transport the 1 TB hour$^{-1}$ data stream, so we ship disks to our computing centers. This approach is complex and not sustainable in the large campaigns needed to find many fast radio transients.   I am interested in thinking about how real-time processing detection  can help solve the challenges of big data. By bringing computational support closer to the telescope, real-time detection makes it possible to decide whether a given segment of data is worth saving or not. Thiskind of  "data triage" is routinely employed requires a fundamental change  in the particle physics community, but not elsewhere. The rise of big how we collect  data will require this kind of focus to avoid being overwhelmed. and its perceived value.