4. RESULTS

In this section, the performance of the Gaussian and Gamma distributions are analyzed alongside the Root Transform Local Linear Regression method and two Kernel Density models with 2 different bandwidths. First, electric load data from commercial enterprise site with single-tariff is fit with the models and the models are assessed. We, then, repeat the analysis on electric load data from all 2446 enterprise and residential locations.
Fig 4. shows the histograms for both train and test electric load datasets fit with all 5 models. Visually the RTLLR model seems to fit the data best followed by the two KDE models & then parametric distributions. This finding is also supported by the results in Tables 1 and 2 where the following observations can be made:
  1. RMSE and MAE values are lower for the RTLLR model then follows the KDE models and last parametric models.
  2. The \(R^{2}\) values for RTLLR are the highest followed by the KDE models then the parametric distributions
  3. RTLLR is the only model that fails to reject the null hypothesis for the KS test.
In the remainder of this section, the performance of the parametric and nonparametric models is evaluated by the test RMSE and Train KS test p-values of all enterprise and residential sites. The left panel of Fig 5. presents the average scores of each model according to a score system from 1 to 5, 1 for the model with the highest Test RMSE and 5 the one with the least. The right panel of Fig. 5 displays a box plot for the relative percentage test RMSE improvement of the RTLLR, two KDE and Gamma models with the Gaussian model. The results show that RTLLR almost always outperforms all the models and has a significant average relative percentage improvement (around 80%) with respect to the Gaussian distribution. The 2 KDE models also show promising results with average relative percentage improvement (around 50%) to the Gaussian distribution. However, the Gamma distribution while overall outperforming the Gaussian distribution, around 30% average percentage improvement, seems to be unreliable as it underperforms significantly in multiple location, as seen in the numerous outliers below the bottom whisker.
Fig 6. shows the train KS test p-values of all enterprise and residential sites. The figure suggests that electric load data does not seem to be distributed by a Gaussian nor a Gamma distribution since they reject the KS test null hypothesis for all sites (i.e. p-values are less than 0.01). The 2 KDE models do seem to be a good fit for a small number of locations. However, the RTLLR looks more promising as it does well in a good number of the locations and attains the highest p-values in our study.