Authorea

Technical Overview

We propose building a real-time data processing and management system for fast transient searches at the VLA. The realfast system will open access to higher data rates and faster integration times, but it will be truly transformative for tapping into the commensal VLA fast data stream. Integrating our computing system with the commensal data stream turns each observation into a fast transient search, making the VLA into a 24/7 fast transient survey machine.

Enabling Technologies

The ATI will make this possible by supporting four key technologies:

1) Computing hardware at the VLA: Computing is the most significant missing component to the creation of a real-time, commensal VLA fast transient detection system. This proposal will support the purchase of servers, GPUs, and infrastructure to be dedicated to the realfast system. GPUs are becoming increasingly common in real-time astronomy applications \citep{2010MNRAS.408.1936B,2011MNRAS.417.2642M} and members of our team have demonstrated the effectiveness of GPUs for interferometric imaging and transient/pulsar searches (see §\ref{computing}).

2) Commensal data stream: The commensal data stream is a high-speed duplicate of the primary observing stream. Integrating realfast with the VLA system requires sending data and observing metadata to realfast and archiving real-time transient detections. This proposal supports NRAO staff to work with our team on this integration, largely in the first year of the ATI (2016–2017). This proposal also supports commissioning of higher data rate modes (faster, more bandwidth, more polarizations).

3) Transient search pipeline: The new hardware will run our transient search pipeline, which will evolve as the computing support changes. The first year will focus on implementing the current real-time system with the new hardware and commensal data stream. The second year will focus on redeveloping the software to take advantage of the GPUs, most likely by coding portions in CUDA/C for efficiency. Extensive tests of our pipeline and the development of prototype GPU-accelerated fast imaging code has retired most of the risk associated with algorithm development.

4) Candidate data management system: The ATI will support implementation of a portal for inspecting and verifying data quality associated with all candidates, similar to that developed for the V-FASTR transient detection system \citep{2012ApJ...753L..36W,2011ApJ...735...98T}. Team members will classify each candidate as either good (keep) or bad (delete) that will be used manage the flow of recorded data into the archive. Ultimately (though outside the scope of this ATI), this data quality classification can be used to train a statistical classifier to automatically identify the most promising candidates, easing the burden on humans and speeding their progress to public distribution.

System Description

The hardware for realfast will be a GPU-accelerated cluster attached to the VLA correlator (Figure \ref{diagram}, top). The bulk of the correlation computation is done in field-programmable gate arrays, while the final formation of visibilities is done on the CBE, a commodity compute cluster. To enable real-time processing of the data, the transient detection system must be co-located with the CBE and share access to its networking infrastructure (Figure \ref{diagram}, in yellow). This drives our decision to consider the transient detector as an extension of the CBE.

By balancing space and power needs with science capabilities (see §\ref{computing}), we decided that an optimal system would consist of one head node that manages data flow to 32, 2U servers housed in 2 racks. Each server will hold 2 GPUs and 2 Intel Xeon CPUs. The CPUs and memory size will are similar to those we have used extensively in current real-time transient searches.

{wrapfigure}

r0.7

\label{diagram}(Top:) Data flow for correlator and transient detection system. Blue blocks show existing computing infrastructure, while the yellow blocks show the proposed development. (Bottom:) Detail of candidate and data flow through transient detection system. Dotted lines show data flow that is triggered.

The top panel of Figure \ref{diagram} shows how the data stream is duplicated for commensal processing. The standard data path typically gets averaged to \(\sim 1\) s timescale (as defined by the primary user), while the commensal data path will be averaged to millisecond timescales. The commensal (fast) data stream can be sampled with 8-bit values (compared to the standard 32 bits used currently) without sacrificing data quality. For a data sampled with 1 ms integrations, 351 baselines, 256 channels, and 2 polarizations, we expect a data rate of 0.4 GB s\({}^{-1}\).

Visibilities will be sent to the head node of the transient detection system. In the simplest design, we will define a single 512 GB buffer on the head node and distribute data to the rest of the transient detection system. In this design, we can store up to 30 minutes of data in our current standard fast imaging mode (280 MB/s) or up to 10 minutes of VLASS commensal data sampled at 25 ms (890 MB/s). We will also test an alternative buffer design using the memory on the transient detector nodes. This kind of buffer would be 1 TB in size (and expandable by a factor of 2–8), but require a more complex interaction with the VLA archive.

Our baseline plan is to send a series of 20 s segment of data to each of the 32 worker nodes in a round robin fashion. A single node will process a segment 32 times slower than real time, but real-time processing is maintained over all nodes. This design introduces a latency of 32 times the segment size, or roughly 10 minutes. This can be reduced by using smaller data segments, as allowed by the buffer; testing will help us balance the processing and buffer requirements. Segments will partially overlap in time to keep sensitivity to dispersive delays of all scales at all times. To keep the overlap to less than 10% of the total data volume, the minimum data segment size should be at least 10 times the largest dispersive sweep of roughly 2 seconds (i.e., DM\({}_{\rm{max}}=2000\) pc cm\({}^{-3}\), bandwidth of 256 MHz, L-band).

Each node of the system operates independently to perform data flagging, mean subtraction, dedispersion, and imaging. Calibration products are taken from another part of VLA system (called “telcal”). When a candidate is detected, the buffered data are saved for the duration of the event (plus some padding in time), which is at most 6 seconds for an FRB-like search (\(\sim 2.4\) GB assuming 1 ms integrations, 256 channels, and 2 polarizations). The transient search pipeline will also average data over timescales up to \(\sim 1\) min to search for slower transients. Longer timescales have fewer integrations and far fewer dispersion trials, so they easier to search and produce a smaller candidate event rate than the fastest time scale.