3.1.2 Kernel Density Estimation
KDE is one of the most popular nonparametric estimation methods. The
purpose of this technique is to estimate the unknown probability density directly from the
data without making any assumptions on the shape of the distribution
that parametric distributions make. The estimate \({\hat{f}}_{\text{KDE}}\) of the
unknown density \(f_{X}\) is constructed from \(n\) observed data points
as follows:
\[{\hat{f}}_{\text{KDE}}=\frac{1}{nh}\ \sum_{i=1}^{n}{K\left(\frac{X_{i}-X}{h}\right)}\text{\ \ }\nonumber \\
\]
where \(X_{1},\ldots,X_{n}\) are the \(n\) observed data points, \(h\in(0,\infty)\) is the bandwidth parameter, and the kernel function \(K\) is a nonnegative function with \(\int{K=1}\)
In order to build the estimate \({\hat{f}}_{\text{KDE}}\), an
investigator must decide on the kernel function and the bandwidth
parameter to be used. Research shows that different choices of kernel
function have no significant difference on the fit of the data while the
selection of bandwidth is of great importance in the build of \({\hat{f}}_{\text{KDE}}\) \cite{Silverman_1986} \cite{Devroye_2001}. In this paper, two common
rule-of-thumb formulas are used to calculate the bandwidths for the KDE
models:
\[h_{ROT1}=1.059\times\hat{\sigma}\times n^{-\frac{1}{5}}\text{\ \ \ }\nonumber \\\]\[h_{ROT2}=\hat{\sigma}\times n^{-\frac{1}{6}}\text{\ \ \ }\nonumber \\\]
A limitation of KDE is boundary bias where it underestimates data that
is close to the boundaries. This may pose significant problems when
fitting KDEs to electric load data since the bulk of the data do not lie
around the center of the range but rather near the boundaries.