Detailed method selection

Based on the technology classification problem considered, the bibliometric data available, and the methods discussed in sections \ref{650363} to \ref{875755} the following methods have been selected for use in this study:

Technology Life Cycle stage matching process

For those technologies where evidence for determining the transitions between different stages of the Technology Life Cycle has either not been found or is incomplete, a nearest neighbour pattern recognition approach has been employed following the work of Gao \cite{Gao_2013} to locate the points where shifts between cycle stages occur. In these circumstances a supervised learning approach is taken as the well-established nature of the Technology Life Cycle model means that it is widely recognised to form a sensible basis for classifying technological maturity, so it is not believed necessary to re-establish the validity of the categories being assigned. Equally, the nearest neighbour approach is commonly used as an industry standard, so no further development is proposed here for the current study. However, for the specific technologies considered in chapters 5 and 6, literature evidence has been identified for the transitions between stages, and so the nearest neighbour methodology outlined in chapter 5 is only done so as a provision for expansion to other technologies in future studies.

Identification of significant patent indicator groups

In order to identify those bibliometric indicator groupings that could form the basis of a data-driven technology classification model a combination of Dynamic Time Warping and the 'PAM' variant of K-Medoids clustering has been applied in this study. For the initial feature alignment and distance measurement stages of this process, Dynamic Time Warping is still widely recognised as the classification benchmark to beat (see section \ref{446824}), and so this study does not look to advance the feature alignment processes used beyond this. Unlike the Technology Life Cycle stage matching process which is based on a well-established technology maturity model, this study is assuming that a classification system based on the modes of substitution outlined in section \ref{771448} is not intrinsically valid. For this reason an unsupervised learning approach has been adopted here to enable human biases to be eliminated in determining whether a classification system based on presumptive technological substitution is valid or not, before subsequently defining a classification rule system. In doing so this also means that labelling of predicted clusters can be carried out even if labels are only available for a small number of observed samples representative of the desired classes, or potentially even if none of the observed samples are absolutely defined. This is of particular use if this technique is to be expanded to a wider population of technologies, as obtaining evidence of the applicable mode of substitution that gave rise to the current technology can be a time-consuming process, and in some cases the necessary evidence may not be publicly available (e.g. if dealing with commercially sensitive performance data). As such, clustering can provide an indication of the likely substitution mode of a given technology without the need for prior training on technologies that belong to any given class. Under such circumstances this approach could be applied without the need for collecting performance data, providing that the groupings produced by the analysis are broadly identifiable from inspection as being associated with the suspected modes of substitution (this is of course made easier if a handful of examples are known, but means that this is no longer a hard requirement).
The 'PAM' variant of K-Medoids is selected here over hierarchical clustering since the expected number of clusters is known from the literature, and keeping the number of clusters fixed allows for easier testing of how frequently predicted clusters align with expected groupings. Additionally, a small sample of technologies is evaluated in this study, and as a result computational expense is not likely to be significant in using the 'PAM' variant of K-Medoids over hierarchical clustering approaches. The Euclidean distance metric is subsequently selected for the K-Medoids clustering in order to ensure consistency with the Dynamic Time Warping measures available (see \cite{dtw}). Amplitude normalisation of the time series considered further ensures that Euclidean distance measures are not inadvertently biased by observations of high or low values (see section \ref{396009}). It is also worth noting that by evaluating the predictive performance of each subset of patent indicator groupings independently it is possible to spot and rank commonly recurring patterns of subsets, which is not possible when using approaches such as Linear Discriminant Analysis which can assess the impact of individual predictors, but not rank the most suitable combinations of indicators.

Ranking of significant patent indicator groups

As the number of technologies considered in this study is relatively small, exhaustive cross-validation approaches provide a feasible means to rank the out-of-sample predictive capabilities of those bibliometric indicator subsets that have been identified as producing significant correlations to expected in-sample technology groupings. As such, leave-p-out cross-validation approaches are applied for this purpose, whilst also reducing the risk of over-fitting in the following model building phases.

Technology classification model building

The misalignment in time between life cycle stages relative to other technologies can make it difficult to identify common features in time series. This is primarily because this phase variance risks artificially inflating data variance, skewing the driving principal components and often disguising underlying data structures \cite{Marron_2015}. Consequently, due to the importance of phase variance when comparing historical trends for different technologies, and the coupling that exists between adjacent points in growth and adoption curves, functional linear regression is selected here to build the technology classification model developed in this study (see section \ref{875755}).

Sensitivity of technology adoption to chosen modelling parameters

Whilst statistical approaches are well-suited to detecting underlying correlations in historical and experimental datasets, this on its own does not provide a detailed understanding of the causation behind associated events. Equally, statistical methods are not generally well suited to predicting disruptive events and complex interactions, with other simulation techniques such as System Dynamics and Agent Based Modelling performing better in these areas (see chapter 2 and sections \ref{174036} and \ref{655650}). Accordingly, in order to identify causation effects and test the sensitivity of technological substitution patterns to variability arising from real-world socio-technical features not captured in simple bibliometric indicators (such as the influence of competition and more precise economic effects), the fitted regression model is evaluated in a real-time system dynamics environment. Bearing in mind the emphasis placed on traceability by survey respondents in section \ref{788541}, this is thought to be a more sensible first development prior to attempting to capture more complex emergence effects using ABM, whilst also providing a baseline for comparisons in subsequent studies.

Method limitations

Although precautions have been taken to ensure that the methods selected for this study address the problems posed in building generalised technology classification and substitution models based on bibliometric data in as rigorous a fashion as possible, there are some known limitations to the methods used in this work that must be recognised. Many of the current limitations stem from the fact that in this analysis technologies have been selected based on where evidence is obtainable to indicate the mode of adoption followed. As such the technologies considered here do not come from a truly representative cross-section of all industries, so it is possible that models generated will provide a better representation of those industries considered rather than a more generalisable result. This evidence-based approach also means that it is still a time-consuming process to locate the necessary literature material to be able to support classifying technology examples as arising based on one mode of substitution or another, and to then compile the relevant cleaned patent datasets for analysis. As a result only a relatively limited number of technologies have been considered in this study, which should be expanded on to increase confidence in the findings produced from this work. This also raises the risk that clustering techniques may struggle to produce consistent results based on the small number of technologies considered. Furthermore, any statistical or quantitative methods used for classification in chapter 5 are unlikely to provide real depth of knowledge beyond the detection of correlations behind patent trends when used in isolation. Ultimately some degree of causal exploration, whether through case study descriptions, system dynamics modelling, or expert elicitation is required to shed more light on the underlying influences shaping technology substitution behaviours. These are accounted for as best as possible in the technology timelines presented in chapter 5, and adoption trends and substitution modelling activities presented in chapter 6, but without further study these can only truly be considered as exploratory at this stage.
Other data-specific issues that could arise relate to the use of patent searches in this analysis and the need to resample data based on variable length time series. The former relates to the fact that patent search results and records can vary to a large extent based on the database and exact search terms used, however overall trends once normalised should remain consistent with other studies of this nature (this point is addressed in more detail in section XX). The latter meanwhile refers to the fact that functional linear regression requires all technology case studies to be based on the same number of time samples. As such, as discussed in section \ref{875755}, linear interpolation is used as required to ensure consistency on the number of observations whilst possibly introducing some small errors which are not felt to be significant.   

Selected data sources

Two types of data sources are considered in the current study. Bibliometric sources are first analysed in chapter 5 for the technologies of interest, in terms of extracted patent datasets, before subsequently being coupled with technology adoption data in chapter 6 within the technology substitution model, enabling the impact of different modes of substitution to be related to measured development efforts.

Patent data

Patent data has been sourced from the Questel-Orbit patent search platform in this analysis. More specifically, the full FamPat database was queried in this study, which groups related invention-based patents filed in multiple international jurisdictions into families of patents. This platform is accessed by subscribers via an online search engine that allows complex patent record searches to be structured, saved, and exported in a variety of formats. A selection of keywords, dates, or classification categories are used in this search engine to build relevant queries for a given technology (this process is discussed in more detail in chapter 5). The provided search terms are then matched in the title, abstract, and key content of all family members included in a FamPat record, although unlike title and abstract searches, key contents searches (which include independent claims, advantages, drawbacks, and the main patent object) are limited to only English language publications. Some of the core functionalities behind this search engine are outlined in \cite{Questel_Orbit_2000}.

Technology adoption data

Adoption data for the technologies investigated is taken from a wide variety of sources due to the broad scope of the technology domains considered. Where possible, global technology sales and shipment values have been used to determine the overall market share of each technology at a given time, although in some cases data values have been imputed to fill gaps in time series (this is stated where this has been applied, as well as the method of deriving imputed values). Furthermore, the preference has been to extract statistical data directly from international agencies such as the UN, World Bank, International Energy Agency, International Council on Clean Transportation, International Telecommunication Union, and Eurostat when available, as these organisations generally present the most consistent representation of the technologies considered when taking into consideration regional development trends. In many cases, this information was accessed via the UK Data Service \cite{UKDS_stat}.
A brief description of each data source used for technology adoption data is given in Table \ref{table:data_sources_for_technology_adoption_data}: