3.1.2 Kernel Density Estimation

KDE is one of the most popular nonparametric estimation methods. The purpose of this technique is to estimate the unknown probability density directly from the data without making any assumptions on the shape of the distribution that parametric distributions make. The estimate \({\hat{f}}_{\text{KDE}}\) of the unknown density \(f_{X}\) is constructed from \(n\) observed data points as follows:
\[{\hat{f}}_{\text{KDE}}=\frac{1}{nh}\ \sum_{i=1}^{n}{K\left(\frac{X_{i}-X}{h}\right)}\text{\ \ }\nonumber \\ \]
where \(X_{1},\ldots,X_{n}\) are the \(n\) observed data points, \(h\in(0,\infty)\) is the bandwidth parameter, and the kernel function \(K\) is a nonnegative function with \(\int{K=1}\)
In order to build the estimate \({\hat{f}}_{\text{KDE}}\), an investigator must decide on the kernel function and the bandwidth parameter to be used. Research shows that different choices of kernel function have no significant difference on the fit of the data while the selection of bandwidth is of great importance in the build of \({\hat{f}}_{\text{KDE}}\) \cite{Silverman_1986} \cite{Devroye_2001}. In this paper, two common rule-of-thumb formulas are used to calculate the bandwidths for the KDE models:
\[h_{ROT1}=1.059\times\hat{\sigma}\times n^{-\frac{1}{5}}\text{\ \ \ }\nonumber \\\]\[h_{ROT2}=\hat{\sigma}\times n^{-\frac{1}{6}}\text{\ \ \ }\nonumber \\\]
A limitation of KDE is boundary bias where it underestimates data that is close to the boundaries. This may pose significant problems when fitting KDEs to electric load data since the bulk of the data do not lie around the center of the range but rather near the boundaries.