4. RESULTS
In this section, the performance of the Gaussian and Gamma distributions
are analyzed alongside the Root Transform Local Linear Regression method
and two Kernel Density models with 2 different bandwidths. First,
electric load data from commercial enterprise site with single-tariff is
fit with the models and the models are assessed. We, then, repeat the
analysis on electric load data from all 2446 enterprise and residential
locations.
Fig 4. shows the histograms for both train and test electric load
datasets fit with all 5 models. Visually the RTLLR model seems to fit
the data best followed by the two KDE models & then parametric
distributions. This finding is also supported by the results in Tables 1
and 2 where the following observations can be made:
- RMSE and MAE values are lower for the RTLLR model then follows the KDE
models and last parametric models.
- The \(R^{2}\) values for RTLLR are the highest followed by the KDE
models then the parametric distributions
- RTLLR is the only model that fails to reject the null hypothesis for
the KS test.
In the remainder of this section, the performance of the parametric and
nonparametric models is evaluated by the test RMSE and Train KS test
p-values of all enterprise and residential sites. The left panel of Fig
5. presents the average scores of each model according to a score system
from 1 to 5, 1 for the model with the highest Test RMSE and 5 the one
with the least. The right panel of Fig. 5 displays a box plot for the
relative percentage test RMSE improvement of the RTLLR, two KDE and
Gamma models with the Gaussian model. The results show that RTLLR almost
always outperforms all the models and has a significant average relative
percentage improvement (around 80%) with respect to the Gaussian
distribution. The 2 KDE models also show promising results with average
relative percentage improvement (around 50%) to the Gaussian
distribution. However, the Gamma distribution while overall
outperforming the Gaussian distribution, around 30% average percentage
improvement, seems to be unreliable as it underperforms significantly in
multiple location, as seen in the numerous outliers below the bottom
whisker.
Fig 6. shows the train KS test p-values of all enterprise and
residential sites. The figure suggests that electric load data does not
seem to be distributed by a Gaussian nor a Gamma distribution since they
reject the KS test null hypothesis for all sites (i.e. p-values are less
than 0.01). The 2 KDE models do seem to be a good fit for a small number
of locations. However, the RTLLR looks more promising as it does well in
a good number of the locations and attains the highest p-values in our
study.