Categorisation and ranking of validation themes for simulation methods
Copy across from IEMS paper
Statistical comparisons of time series
This study considers 23 technologies, defined in Table 2 in chapter 5, where literature evidence has been identified to classify the particular mode of technology substitution observed. The evidence and process used in this categorisation is outlined in detail in section XX of chapter 2. Using bibliometric analysis methods it is possible to extract a variety of historical trends for any technologies of interest, effectively generating a collection of time series data points associated with a given technology (these multidimensional time series datasets are referred to here as 'technology profiles'). This raises the question of how best to compare dissimilar bibliometric technology profiles in an unbiased manner in order to investigate whether literature based technology substitution groupings can be determined using a classification system built on the assumptions given in section \ref{771448}. In particular comparisons of technology time series can be subject to one or more areas of dissimilarity: time series may be based on different number of observations (e.g. covering different time spans), be out of phase with each other, may be subject to long-term and shorter term cyclic trends, be at different stages through the Technology Life Cycle (or be fluctuating between different stages) \cite{little1981strategic}, or be representative of dissimilar industries. As such, a body of work already exists on the statistical comparison of time series, and in particular time series classification methods \cite{lin2012pattern}. Most modern time series pattern recognition and classification techniques emerging from the machine learning and data science domains broadly fall within the categories of supervised, semi-supervised, or unsupervised learning approaches. The distinction between these categories is based on the amount of training information provided to the classifier in each case. In supervised learning, training time series are provided with known classification labels, whilst training time series with both known and unknown classification labels are used in semi-supervised learning. By contrast, unsupervised learning approaches are not provided with any classification labels, and as such are required to determine groupings independently (e.g. clustering) \cite{lin2012pattern}. Table \ref{table:time_series_pattern_recognition_techniques} below provides an overview of time series pattern recognition techniques commonly used (this list is not exhaustive):