Authorea

Clayton Miller edited subsectionData_prepr.tex almost 10 years ago

Commit id: fc007cd755e00f4b1ff35609ac898e91c2255c29

deletions | additions

In brief, the SAX transformation is as follows. The normalized time-series, $Z(t)$, is first broken down into $N$ individual non-overlapping subsequences. This step is known as \emph{chunking}, and the period length $N$ is based on a context-logical specific period \cite{Lin:2005bi}. In our situation $N$ is chosen as 24 hours due to the focus on daily performance characterization. Each chunk is then further divided into $W$ equal sized segments. The mean of the data across each of these segments is calculated and an alphabetic character is assigned according where the mean lies within a set of vertical breakpoints, $B=\beta_1,...,\beta_{a-1}$. These breakpoints are calculated according to a chosen alphabet size, $A$, to create equiprobable regions based on a Gaussian distribution, as seen in Table~\ref{fig:SAXBreakpoints}. \begin{figure} \centering \caption{Example breakpoint lookup table from Keogh et. al \cite{Keogh:2005wd} for $A$ = 3, 4, 5 calculated from a Gaussian distribution} \includegraphics[height=1.5in]{./Graphics/SAXGeneric/SAXBreakpoints.pdf} \label{fig:SAXBreakpoints} \end{figure} Based on a chosen value of $W$ segments and alphabet size $A$, each $N$ size window is transformed into a SAX \emph{word}. An example of this process is seen in Figure~\ref{fig:SAXWord}. This example shows two daily profiles which are converted to the SAX words, \emph{acba} and \emph{abba}. The SAX word is useful from an interpretation point of view in that each letter corresponds consistently to a subsequence of data from the daily profile. For example, the first letter explains the relative performance for the hours of midnight to 6:00 AM. Therefore if the size of $A$ is set to 3, a SAX word whose first letter is $a$ would have low, $b$ would indicate average, and $c$ would correspond to high consumption. Larger sizes of $A$ would create SAX words with a more diverse range of characters and would capture more resolution magnitude-wise.