Yu Zheng - Authorea

We developed a framework merging unsupervised and supervised machine learning to classify lightning radio signals, and applied it to the possible detection of terrestrial gamma-ray flashes (TGFs). Recent studies have established a tight connection between energetic in-cloud pulses (EIPs, >150 kA) and a subset of TGFs, enabling continuous and large-scale ground-based TGF detection. However, even with a high peak current threshold, it is time-consuming to manually search for EIPs in a background of many non-EIP events, and it becomes even more difficult when a lower peak-current threshold is used. Machine learning classifiers are an effective tool. Beginning with unsupervised learning, spectral clustering is performed on the low-dimensional features extracted by an autoencoder from raw radio waveforms, showing that +EIPs naturally constitute a distinct class of waveform and 67% of the total population. The clustering results are used to form a labeled dataset (~10,000 events) to further train supervised convolutional neural network (CNN) that targets for +EIPs. Our CNN models identify on average 95.2% of true +EIPs with accuracy up to 98.7%, representing a powerful tool for +EIP classification. The pretrained CNN classifier is further applied to identify lower peak current EIPs (LEIPs, >50 kA) from a larger dataset (~30,000 events). Among 10 LEIPs coincident with Fermi TGF observations, 2 previously reported TGFs and 2 unreported but suspected TGFs are found, while the majority are not associated with detectable TGFs. In addition, unsupervised clustering is found to reflect characteristics of the ionosphere reflection height and its effect on radio wave propagation.