3.2.3 Kolmogorov Smirnov Test

Another interesting error metric we explored is the One-sample Kolmogorov-Smirnov (KS) test. Kolmogorov-Smirnov tests if data is distributed according to a specific model. The KS test is done by finding the supremum distance statistic which is calculated by finding the difference between the data’s Empirical CDF, an estimator of the data’s CDF, is to the model’s CDF. Then, the KS test p-value can be calculated using the supremum distance statistic which has an asymptotic CDF given by the KS function \cite{portugus}. There is evidence that a model is a good fit for the data if the KS test’s p-value is larger than the threshold \(\alpha\) where we consider a threshold of \(\alpha=0.01\) in this report.

3.3 Data Splitting

A common practice in statistics that is not present in power systems engineering systems field is data splitting. When estimating the PDF of data to accurately test a model’s fitness on the data, one should first split the data into two datasets test and train. Then, one fits their model on the training dataset and test how good the model is on the test set. This is because if a model is a good fit for the current data, it doesn’t necessarily mean it is a good fit for other data. Therefore, we consider the test dataset as other data that we use to measure the model’s quality. What exactly do we do? The train and test datasets were split randomly according to a 75%:25% split in all analysis in this paper.