Jorge Amaya - Authorea

One of the most important steps in any AI/ML application is the pre-processing of the data. The objective of this step is to project the original data in a new basis, or in a new latent space, where the different features of the problem are comparable and where their distribution covers a large range of values. Using the data in its natural basis can lead to under-performing AI/ML models. While almost all papers in our domain are careful to normalize or standardize the data, it is less frequent to see the use of simple linear PCA transformations, and even less frequent the use of more complex non-linear projections in latent spaces. Here we show how our research team is using autoencoder neural networks to perform non-linear transformations of images, simulations and time-series used in heliophyisical applications. Autoencoder transformations allow to parametrize any type of data by projecting it onto a latent space of higher or lower dimension. In these latent spaces the transformed data commonly presents better statistical properties allowing improvements in the AI/ML modeling. In addition, autoencoders are also known as generative techniques, i.e. they can be used to produce “artificial” or “synthetic” data. We will present three particular examples of the use of autoencoders: 1) parametrization of solar wind observations using standard feed forward autoencoders, 2) parametrization of magnetosphere simulations using convolutional autoencoders, and 3) parametrization and generation of solar active regions using variational convolutional autoencoders. We will show how these parametrizations can then be used for AI/ML classification and forecasting. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 776262 (AIDA).