2.2 The Statistical Problem
Electric load data is collected as a time series with \(n\) observations \(X_{1},X_{2},\ldots,X_{n}\). In the case of the electric load data in
subsection 2.1, a yearly collection of electric load data yields 17520
electric load observations, denoted as \(X_{1},X_{2},\ldots,X_{17520}\).
Power system planning and optimization require accurate probability estimation at random points in the future during the life of a power
network. In other words, let \(X\) denote a random variable whose value
is the electric load at some point in the future. The goal is to use the
data to estimate the value of \(\mathbb{P(}a<X<b)\) for various
values of \(a\) and \(b\) \cite{Li_2019}.
The goal of estimating \(\mathbb{P(}a<X<b)\) is a well-known problem
in the statistics field \cite{2006} \cite{Silverman_1986} which supposes that \(X\), the
electric load data at some point in the future, has a probability
density function \(f_{X}\), and that the data\(X_{1},X_{2},\ldots,X_{n}\) are \(n\) independent, identically
distributed observations drawn from the same distribution \(f_{X}\). The
probability density function \(f_{X}\) of electric load is unknown and
our goal is to find or build a function \(\hat{f_{X}}\ \)from the data
in order to estimate the unknown \(f_{X}\). After finding a good enough
estimate \(\hat{f_{X}}\), \(\mathbb{P(}a<X<b)\) reduces to a
numerical integration problem of \(\hat{f_{X}}\) over the interval \((a,b)\).