2.2 The Statistical Problem

Electric load data is collected as a time series with \(n\) observations \(X_{1},X_{2},\ldots,X_{n}\). In the case of the electric load data in subsection 2.1, a yearly collection of electric load data yields 17520 electric load observations, denoted as \(X_{1},X_{2},\ldots,X_{17520}\). Power system planning and optimization require accurate probability estimation at random points in the future during the life of a power network. In other words, let \(X\) denote a random variable whose value is the electric load at some point in the future. The goal is to use the data to estimate the value of \(\mathbb{P(}a<X<b)\) for various values of \(a\) and \(b\) \cite{Li_2019}.
The goal of estimating \(\mathbb{P(}a<X<b)\) is a well-known problem in the statistics field \cite{2006} \cite{Silverman_1986} which supposes that \(X\), the electric load data at some point in the future, has a probability density function \(f_{X}\), and that the data\(X_{1},X_{2},\ldots,X_{n}\) are \(n\) independent, identically distributed observations drawn from the same distribution \(f_{X}\). The probability density function \(f_{X}\) of electric load is unknown and our goal is to find or build a function \(\hat{f_{X}}\ \)from the data in order to estimate the unknown \(f_{X}\). After finding a good enough estimate \(\hat{f_{X}}\)\(\mathbb{P(}a<X<b)\) reduces to a numerical integration problem of \(\hat{f_{X}}\) over the interval \((a,b)\).