3.2.3 Kolmogorov Smirnov Test
Another interesting error metric we explored is the One-sample Kolmogorov-Smirnov (KS) test.
Kolmogorov-Smirnov tests if data is distributed according to a specific
model. The KS test is done by finding the supremum distance statistic
which is calculated by finding the difference between the data’s Empirical CDF, an estimator of the data’s
CDF, is to the model’s CDF. Then, the KS test p-value can be calculated
using the supremum distance statistic which has an asymptotic CDF given
by the KS function \cite{portugus}. There is evidence that a model is a good
fit for the data if the KS test’s p-value is larger than the threshold \(\alpha\) where we consider a threshold of \(\alpha=0.01\) in this report.
3.3 Data Splitting
A common practice in statistics that is not present in power systems engineering systems field is data splitting. When estimating the PDF of
data to accurately test a model’s fitness on the data, one should first
split the data into two datasets test and train. Then, one fits their
model on the training dataset and test how good the model is on the test
set. This is because if a model is a good fit for the current data, it
doesn’t necessarily mean it is a good fit for other data. Therefore, we
consider the test dataset as other data that we use to measure the
model’s quality. What exactly do we do? The train and test datasets were
split randomly according to a 75%:25% split in all analysis in this
paper.