1. RMSE and MAE values are lower for the RTLLR model followed by the KDE models and lastly parametric models.
  2. The \(R^{2}\) values for RTLLR are the highest followed by the KDE models then the parametric distributions
  3. RTLLR is the only model that fails to reject the null hypothesis for the KS test.
In the remainder of this section, the performance of the parametric and nonparametric models is evaluated by the test RMSE (Fig. \ref{410095}), test \(R^2\) values (Fig. \ref{182777}) and Train KS test p-values (Fig. \ref{667516}) of all enterprise and residential sites. 
The left panel of Fig. \ref{410095} presents the average scores of each model according to a scoring system from 1 to 5, 1 for the model with the highest Test RMSE and 5 the one with the least. The right panel of Fig. \ref{410095} displays a box plot for the relative percentage test RMSE improvement of the RTLLR, two KDE and Gamma models with the Gaussian model. The results show that RTLLR almost always outperforms all models (score = 4.89) and has a significant average relative percentage improvement (around 80%) with respect to the Gaussian distribution. The 2 KDE models also show promising results with average relative percentage improvement (around 50%) to the Gaussian distribution. However, the Gamma distribution while overall outperforming the Gaussian distribution, around 30% average percentage improvement, seems to be unreliable as it underperforms significantly in multiple locations, as seen in the numerous outliers below the bottom whisker.  
Fig. \ref{182777} presents similar visual plots as Fig. \ref{410095} but with test \(R^2\) values rather than test RMSE values. Results from the left panel of Fig. \ref{182777} shows that the RTLLR method outperforms all other techniques in most locations followed then by the KDE models and finally the parametric models. The right panel of Fig. \ref{182777} visually shows the reliability of the RTLLR method in explaining the data's variance since the mean \(R^2\) value is almost 1 and it has the narrowest interquartile range of \(R^2\) values among all other models. Moreover, the KDE models, \(\mu_{R^2}\ \approx0.77\), seem to perform significantly better than both the Gamma,  \(\mu_{R^2}\ \approx0.70\) , and Gaussian distributions,  \(\mu_{R^2}\ \approx0.35\). In addition, there are also instances where the parametric models attain negative  \(R^2\) values signifying that a constant mean model would provide a better fit for the data. 
Fig. \ref{667516} shows the train KS test p-values of all enterprise and residential sites. The figure suggests that electric load data does not seem to be distributed by a Gaussian nor a Gamma distribution since they reject the KS test null hypothesis for all sites (i.e. \(p<0.01\)). The 2 KDE models do seem to be a good fit for a small number of locations. However, the RTLLR looks more promising as it does well in a good number of the locations and attains the highest p-values in our study.