Authorea

Clayton Miller edited Decision_threshold.tex almost 10 years ago

Commit id: c35784a1fc16c9df2e80fdc7e217ac0a84cab2f2

deletions | additions

We use k-means to cluster the daily profiles after removing the discord candidate day-types. This ensures load profile patterns that are not influenced by the less frequent discords. Time series clustering can be approached as a raw-data-based, feature-based, or model-based solution \cite{WarrenLiao:2005bq}. Numerous clustering techniques have been developed and evaluated for various contexts and optimization goals. The most common implementation is the k-means clustering algorithm and we chose to use it with the euclidean distance measure due to its simplicity and demonstrated appropriateness for this application \cite{Iglesias:2013ja,MacQueen:1967uv}. The algorithm in our application takes our daily chunks $(N_1, N_2, ..., N_n)$ and partitions these observations into $k$ sets, $S = \{{S_1, S_2, ..., S_k}\}$ so as to minimize the within-cluster sum of squares \cite{Rokach:2005ti}: \begin{equation} \argmin \sum_{i=1}^{k}\sum_{N_j\in S_i} \parallel N_j - \mu_i \parallel ^2 \label{eq:kmeans} \end{equation} where $\mu_i$ is the mean of the points in $S_i$.