Mirko Stumpo

and 5 more

Several techniques have been developed in the last two decades to forecast the occurrence of Solar Proton Events (SPEs), mainly based on the statistical association between the $>$10 MeV proton flux and precursor parameters. The Empirical model for Solar Proton Events Real Time Alert (ESPERTA, Laurenza et al., 2009) provides a quite good and timely prediction of SPEs after the occurrence of $\geq$M2 X-ray bursts, by using as input parameters the flare heliolongitude, the soft X-ray and the $\sim$1 MHz radio fluence. Here, we reinterpret the ESPERTA model in the framework of machine learning and perform a cross validation, leading to a comparable performance. Moreover, we find that, by applying a cut-off on the $\geq$M2 flares heliolongitude, the False Alarm Rate (FAR) is reduced. The cut-off is set to E20° where the cumulative distribution of $\geq$M2 flares associated with SPEs shows a break which reflects the poor magnetic connection between the Earth and eastern hemisphere flares. The best performance is obtained by using the SMOTE algorithm, leading to probability of detection of 0.83 and a FAR of 0.39. Nevertheless, we demonstrate that a relevant FAR on the predictions is a natural consequence of the sample base rates. From a Bayesian point of view, we find that the FAR explicitly contains the prior knowledge about the class distributions. This is a critical issue of any statistical approach, which requires to perform the model validation by preserving the class distributions within the training and test datasets.