Figure (1). Energy Consumption over four years (2012-2015).

5. Results

5.1 Random Forest

5.1.1 Random Forest Regression

As mentioned above we used the train dataset for random forest regression analysis and to avoid overfitting we have done cross validation with GridSearchCV. In order to get a better accuracy score we tried different hyper parameters and ended up with “max_depth” that the we selected the range (1,18) for residential and got back the best max_depth as 5. The accuracy has been reported as 0.201 and the mean squared error is 743.67. For commercial dataset, we selected the max depth range as (1,25), the best max depth for this dataset came out to be 1. The accuracy has been reported as 0.052 and the mean squared error is 1075.61. Also, we have plotted actual “y” test versus the predicted y-test (energy consumption(EUI) in Figure(2).