\documentclass[11pt]{article}

\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}


\usepackage{times}


\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
            linkcolor = blue,
            urlcolor  = blue,
            citecolor = blue,
            anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
  \errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother


\usepackage{natbib}


\renewenvironment{abstract}
  {{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
  {\bigskip}

\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}


\usepackage{authblk}


\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%

\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}

\usepackage[utf8]{inputenc}
\usepackage[english]{babel}


\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{subfig}
\usepackage{subfig,multicol}
\usepackage{multirow}
\usepackage{array}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{natbib} \bibliographystyle{plainnat}

\begin{document}

\title{Identifying the mode and impact of technological substitutions (working
draft of thesis based on PowerPoint slides)}


\author[1]{Ian Marr}%
\affil[1]{University of Bristol}%


\vspace{-1em}


  \date{}


\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup


\sloppy


\section{Abstract}

{\label{994700}}

The introduction of new technologies into heavily regulated industries
such as aerospace is often a very complex, time-consuming and expensive
challenge that requires significant levels of research and development
in order to ensure a successful technology substitution. This challenge
is exacerbated when new technology options represent a fundamental shift
away from well-established principles, as the risk and uncertainties
involved increase significantly. This is currently the case in the
anticipated transition from conventional turbojet aircraft architectures
to all new electric configurations, and equally for the adoption of
technologies enabling mass manufacturing and customisation processes in
aerospace production lines. At the same time, the opportunities
associated with these disruptive or sustaining innovations may be
sufficient to warrant decision-makers adopting new technological
paradigms. In some cases, new technological paradigms arise even while
existing technological paradigms are still undergoing further
developments, and have not yet reached the peak of their performance.
This further complicates the decision for enterprises, as switching to a
new technological paradigm that may or may not out-perform the old one
presents great commercial risk. In this regard it is beneficial to be
able to identify early on whether a new technological paradigm is likely
to have scope for development beyond that of the current dominant
technology, and commercially, when the tipping point might occur where
the new paradigm would become the industry `mainstream' technology
option.

This paper examines historical cases where emerging technologies have
been presumed in-advance to have development opportunities beyond those
of pre-existing technologies, subsequently leading to transitions
occurring before performance of the existing technology has stagnated.
Bibliometric, pattern recognition, statistical and other data-driven
analysis techniques are applied to technologies identified as having
been adopted as a result of either prior technological stagnation, or as
a result of a presumptive leap being made, in order to identify early
indicators of the mode of technological substitution. Subsequently
models of innovation diffusion are employed to test the causality and
sensitivity of technology adoption to the identified indicators of the
adoption mode.

To date, analysis of existing literature and technology forecasting
techniques, coupled with statistical and functional data analysis of
historical patent data, has structured:

\begin{enumerate}
\tightlist
\item
  the formulation of a functional linear regression model that indicates
  the likely mode of adoption from key technology development indicators
\item
  the conditions required for presumptive technological substitution to
  arise
\item
  an agent-based/system dynamics simulation framework for assessing the
  impact of different modes of technological substitution
\end{enumerate}

Combined, these elements provide a means to support technology strategy
and innovation management. The capability to identify and test the
sensitivity of the mode of adoption for a given technology will reduce
uncertainty in decision-making processes, time-to-market, and allow
robust product/service strategies to be developed in response to
continually emerging global demographic, economic, and physical
conditions.

\section{Introduction}

{\label{649217}}

\subsection{Industrial environment}

{\label{677399}}

Forecasting techniques often used to determine strategies in large
organisations by providing guide to future opportunities, risks,
challenges, \& areas of uncertainty

\subsection{Technology forecasting, substitution patterns, and
technological
failure}

{\label{237706}}

Technological substitution often plays an important role in the fortunes
of modern enterprises. Correctly predicting which technologies are
likely to be most influential can ensure that a firm is best positioned
to steal a large advance over their competitors when the new technology
comes to fruition. Conversely, failure to anticipate the arrival of big
technological shifts can leave firms severely diminished. This is
illustrated by the dramatic impact on Kodak's business following the
introduction of digital photography, that rendered many of the firm's
existing film product's obsolete following an early lead in the digital
field that was not fully capitalised upon~(missing citation). Equally,
investing heavily in a nascent technology too soon can have grave
consequences, as Bertlesmann found from investing in
Napster~(missing citation). As such forecasting techniques are often
used to determine strategies in large organisations by providing an
initial guide to future opportunities, risks, challenges, \& areas of
uncertainty.

In this field, considerable work has already been undertaken on the
modelling of technology diffusion as part of these substitution events.
This has included, amongst many other areas of study, the influence of
successive technology generations, and the impact of time delays on the
perception of new technologies, as illustrated in
Fig.~{\ref{359340}} and
Fig.~{\ref{740770}} respectively.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig1/Fig1}
\caption{{Successive generations of technology substitutions~\protect\hyperref[csl:3]{(Bass, 2004)} ~
{\label{359340}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig2/Fig2}
\caption{{Technology S-curves and the impact of time delays on the perception of
new technologies~\protect(missing citation) ~
{\label{740770}}%
}}
\end{center}
\end{figure}

Classically, the introduction of new technologies is often described as
following an S-curve that assumes uptake is initially slow in the
earliest stages prior to a dominant design emerging, until performance
and functional benefits of the new technology are seen to be greater
than those of existing technologies, at which point uptake significantly
accelerates \hyperref[csl:5]{(Foster, 1986}; \hyperref[csl:6]{Utterback, 1994)}. This model assumes that eventually all
technologies then arrive at a limiting condition where they too begin to
stagnate as uptake reduces (potentially due to market saturation or
competition from new technologies), with substitution to a subsequent
generation of technologies occurring either before or after arriving at
this temporary plateau (see Fig. {\ref{402288}}). This
brings about the notion of continual technological (or functional)
failure, at the point where a replacement technology is sought for the
current technological paradigm. However, the technological `failures'
that lead to this type of substitution vary greatly, and cannot just
assume a single simple definition. In this regard, previous work has
examined what is meant by `technological failure', and has broadly
categorised these occurrences into three main
definitions~\hyperref[csl:7]{(Gooday, 1998)}:

\begin{enumerate}
\tightlist
\item
  \textbf{`Failure' as a social taxonomy of marginalised technologies:}
  `Failure' is not an essential characteristic of the technology itself.
  Instead `failure' depends on a diverse range of usage factors that may
  not be replicated in other cultures, and is chronologically bounded so
  that any given technology can be classed as a success or failure at a
  given point in time according to social responses to it. This
  definition implies that `failure' is a completely unexceptional matter
  in technology, and that all `successful' technologies `fail' at some
  point in their existence~\hyperref[csl:7]{(Gooday, 1998)}
\item
  \textbf{`Failure' as a mundane feature of technological usage and
  development:} Persistent `failure' of technology is an unavoidable
  consequence of ever more demanding expectations that human users
  impose upon their all-too-limited constructions. As such, what `fails'
  is human expectations of hardware performance and distribution - or
  rather a `failure' of socio-technical relations~\hyperref[csl:8]{(Pye, 1978}; \hyperref[csl:7]{Gooday, 1998)}
\item
  \textbf{`Failure' as a perspectival and often contested attribution:}
  many recent sociological studies of technology employ two simplifying
  assumptions; firstly that~there is a decisive closure point in history
  at which a technology is judged a `success' or a `failure', and
  secondly that at this point in time, all parties come to a decision
  that is ultimately consensual, despite being based on differing
  perceptions of the technology's social role. Both of these assumptions
  can be challenged by strong counter-arguments~\hyperref[csl:7]{(Gooday, 1998)}
\end{enumerate}

In the analysis that follows, this study focuses on the first of these
three conditions (whilst the other two are addressed to a greater extent
in separate technology adoption modelling work). Specifically, the
definition of technological failure used in this study is given as:

\begin{quote}
``A point in time at which technology performance~ development
stagnates/plateaus, with no further~ progressive trajectory improvements
foreseen for a significant period of time~ in comparison to the overall
technology lifecycle considered, which is~ subsequently followed by the
substitution of a new technology/architecture that~ is on a progressive
trajectory''
\end{quote}

This means that a technology has been able to reach what could be
observed to be a temporary performance limit in this condition before
substitution to a new discontinuous technology occurs~\hyperref[csl:9]{(Schilling \& Esmundo, 2009)}
(i.e. see left and right images in Fig. {\ref{402288}})\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Fig4/Fig4}
\caption{{This is a caption
{\label{402288}}%
}}
\end{center}
\end{figure}

This definition also follows on from the work of Sood \& Tellis which
applied a ~sub-sampling approach to analyse different types of `multiple
S-curves', and subsequently concluded that technologies tend to follow
more of a step-function, with long periods of static performance
interspersed with abrupt jumps in performance, rather than a classical S
shape.~In this study, stagnation periods were recorded where technology
performance during a given sub-sample had an upper plateau longer in
duration than the immediately preceding growth phase, whilst the
subsequent jump in performance in the year immediately after the plateau
was almost double the performance during the entire
plateau~\hyperref[csl:10]{(Sood \& Tellis, 2005)}. Other studies, including the work of Chang
and Schilling, classify multiple S-curves based on whether successive
curves intersect or are disconnected (see Fig.
~{\ref{402288}} and \hyperref[csl:11]{(Chang \& Baek, 2010}; \hyperref[csl:9]{Schilling \& Esmundo, 2009)}).

\subsection{Anomalies associated with scientific and technological
crisis}

{\label{978136}}

Up till now, only substitution patterns associated with technological
failure have been discussed. However, previous studies have identified
that technological substitutions are not just the result of the existing
technology being deemed to have `failed'. In this sense Edward Constant
argued that a feature common to all technological revolutions was the
emergence of `technological anomalies', which could be traced to either
scientific or technological crisis. The first, and most common cause of
these technological anomalies results from functional failure, where:

\begin{quote}
``either the conventional paradigm proves inappropriate to ''new or more
stringent conditions``, or an individual intuitively assumes that (s)he
can produce a better or a new technological device''~\hyperref[csl:12]{(II, 1973)}
\end{quote}

Alternatively, technological anomalies can arise as a result of
presumptive technological leaps:

\begin{quote}
``The demarcation between~functional-failure anomaly and presumptive
anomaly is that presumptive anomaly is deduced from science before a new
paradigm is formulated and that scientific deduction is the sole reason
for the sole guide to new paradigm creation. No functional failure
exists; an anomaly is presumed to exist, hence presumptive
anomaly''~\hyperref[csl:12]{(II, 1973)}
\end{quote}

Whilst technological revolutions may originate from either scientific or
technological crisis, a critical~ area of commonality lies in the
anomaly-crisis process observed in both conditions:

\begin{quote}
``in both science and technology anomaly causes certain individuals to
reject the conventional paradigm and to create~ new paradigms, and, in
each, crisis may lead to revolution''~\hyperref[csl:12]{(II, 1973)}
\end{quote}

The type of crisis that emerges is dependent on which type of anomaly
precedes it. Scientific crisis can occur irrespective of whether an
alternative theoretical framework exists or not when a persistent,
unresolved, scientific anomaly successfully refutes an established
theory. In this condition the crisis is directly linked to the anomaly.
However, technological anomaly and crisis are rarely so logically
driven, and can~arise~in conditions where existing technological
paradigms are still performing favourably. This is illustrated by the
turbojet revolution of the 1930s and 1940s where piston-engine
developments had provided remarkable performance improvements and
continuing success, but were superseded by scientific advances that were
directly responsible for the radical technological changes that
followed. In addition, in order for a technological anomaly to provoke a
technological crisis, a convincing alternative paradigm must exist, so
that the relative functional failure of the conventional system is
observable. As such, the alternative technological paradigm instigates
the crisis, whilst the technological anomaly may only be seen as
speculation or as a limiting condition to the normal
technology~\hyperref[csl:12]{(II, 1973)}.

\subsection{Modes of substitution}

{\label{771448}}

Based on the definitions of functional failure and presumptive anomaly
described in sections~{\ref{237706}} and
{\ref{978136}}, this study examines the ability to
distinguish between these two modes of substitution (i.e. reactive or
presumptive) from analysis of historical scientific and technological
data. Table {\ref{table:technology_categories}} uses
these definitions and performance evidence obtained from literature to
classify a sample set of technologies according to the mode of
substitution observed:\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{9cm}|p{9cm}}
    {Examples of  technological failure} & {Examples of presumptive anomaly} \\ \midrule
    Plug-compatible market (PCM) disk drives \hyperref[csl:13]{(Christensen \& Rosenbloom, 1995)} & Transition from piston engine to jet engine \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to fibre optic cables from Cu/Al wires for data transfer \hyperref[csl:10]{(Sood \& Tellis, 2005)} & Transition to optical undersea cables from coaxial cables \hyperref[csl:11]{(Chang \& Baek, 2010)} \\
    \hline
    Transition to Low Pressure Sodium lights from Tungsten Filament Lamps \hyperref[csl:11]{(Chang \& Baek, 2010)} & Hydrodynamics, water turbines, and turbine pumps \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to Compact Fluorescent Lamps from Tungsten Filament Lamps \hyperref[csl:11]{(Chang \& Baek, 2010)} & Thermodynamics, steam, and early gas engines \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to White LED lighting from Low Pressure Sodium and Compact Fluorescent Lamps \hyperref[csl:11]{(Chang \& Baek, 2010)} & Organic chemistry and catalytic petroleum cracking \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to hypersonic aircraft from supersonic \hyperref[csl:11]{(Chang \& Baek, 2010)} & Transition to the transistor from the vacuum tube \hyperref[csl:14]{(Foster, 1985)} \\
    \hline
    Transition to coaxial undersea cables from single cable \hyperref[csl:11]{(Chang \& Baek, 2010)} & Nuclear physics and atomic energy \hyperref[csl:12]{(II, 1973)} \\
    \hline
    Transition to T-carrier system from modem internet access \hyperref[csl:11]{(Chang \& Baek, 2010)} & Renewable energy sources \\
    \hline
    Transition to Synchronous Optical Networking (SONET) system from T-carrier internet access \hyperref[csl:11]{(Chang \& Baek, 2010)} & Electric vehicles \\
    \hline
    Transition to ink jet and laser printers from dot matrix printers \hyperref[csl:10]{(Sood \& Tellis, 2005)} &  \\
\end{tabular}%}
\caption{{Identified examples of technological failure and presumptive anomaly}}
\label{table:technology_categories}
\end{table*}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig5/Fig5}
\caption{{This is a caption
{\label{133456}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig6/Fig6}
\caption{{This is a caption
{\label{614146}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig7/Fig7}
\caption{{This is a caption
{\label{185494}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig8/Fig8}
\caption{{This is a caption
{\label{117819}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig9/Fig9}
\caption{{This is a caption
{\label{978755}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig10/Fig10}
\caption{{This is a caption
{\label{573485}}%
}}
\end{center}
\end{figure}

In addition to the modes of substitution outlined in
Table~{\ref{table:technology_categories}}, other
technologies have been identified as `non-starters': these are
technologies that were never mass commercialised. In many cases these
technologies could have been adapted for the target markets considered
but were either never used or failed to demonstrate the required
features,~or performance and cost improvements necessary to warrant
further development beyond initial trials. Examples of non-starter
technologies include wire recorders as an alternative to magnetic tape
technology and chain printers as an alternative to dot matrix printers.
In the case of wire recorders, this format failed to take-off after it
was~excluded from the standard-setting process in favour of magnetic
tape technology, leading to ``technological lock-out'', whilst early
chain printers were quickly eclipsed by the superior performance of the
dot matrix design. Non-starters are excluded in this study, as the
analysis that follows is based on classifying individual technologies
based on technologies that are known to have been successfully
commercialised, and as such~it is not believed their inclusion would
influence the results presented here, although non-starters would need
to be included for reducing uncertainty in the classification of
emerging technologies~\hyperref[csl:10]{(Sood \& Tellis, 2005)}.

Based on Constant's hypothesis regarding scientific and technological
anomalies and their influence on the mode of technological substitution,
this paper looks to test whether bibliometric measures of scientific and
technological development can provide an indication of the mode of
adoption likely to occur.~Consequently, this study theorises that in
order to identify cases of technological substitution arising from
presumptive anomaly a classification scheme would need to consider:

\begin{enumerate}
\tightlist
\item
  a population's perception of the current rate of scientific
  development in observed domains~\hyperref[csl:12]{(II, 1973)}
\item
  a population's perception of~the current rate of technological
  development in observed domains~\hyperref[csl:12]{(II, 1973)}
\item
  a population's perception of the potential opportunity for change
  (e.g. alternatives are~ believed to exist)~\hyperref[csl:12]{(II, 1973)}
\end{enumerate}

\subsection{Measuring perceptions of limits of science and
technology}

{\label{596497}}

Many indicators of science and technological progress have been
developed in the fields of bibliometrics and scientometrics in recent
decades. Whilst these have largely been developed~for the purposes of
identifying and targeting gaps in existing knowledge, as well as for
determining the effectiveness of funding in specific fields of research,
they also provide a systematic approach to compare development trends
across a broad range of scientific domains. When attempting to measure
science it is however important to ensure that any measurements taken
are suitable indicators of the development characteristics that are
being studied. In this regard conceptual distinctions exist between
scientific activity, scientific production, and~ scientific
progress~\hyperref[csl:15]{(Martin, 1996)}:

\begin{enumerate}
\tightlist
\item
  \textbf{Scientific activity:} consumption of the inputs to basic
  research (e.g. related to the number of scientists involved, level of
  funding, support staff and equipment)
\item
  \textbf{Scientific~ production:} extent to which consumption of
  resources creates a body of scientific results. Results are embodied
  both in research publications and in other types of less formal
  communication between scientists
\item
  \textbf{Scientific~ progress:} extent to which scientific activity
  results in substantive contributions to scientific knowledge
\end{enumerate}

Based on this, indicators of scientific progress, such as citation
analysis, are normally considered most appropriate for assessing
scientists' success in producing new scientific knowledge and for
identifying emerging areas of development, leading to their common usage
in the tenure review process~\hyperref[csl:16]{(Narin \& Hamilton, 1996)}. At the same time,
simple publication counts are considered to provide a reasonable measure
of scientific production, but are thought to be much less adequate as an
indicator of contributions to scientific progress due to the unclear
value of each publications individual contribution to knowledge.
Publication counts actually reflect both the level of scientific
progress made by an individual or group, as well as a number of other
factors relating to the social and political pressures behind a study
(e.g. publication practices of the employing institution, country and
research area, or emphasis placed on publications for obtaining
promotion or grants)~\hyperref[csl:17]{(Verbeek, Debackere, Luwel, \& Zimmermann, 2002}; \hyperref[csl:15]{Martin, 1996)}. Realistically these other
extraneous factors cannot be assumed to be small in comparison to the
scientific claims made, or that these effects are randomly distributed
and cancel out~\hyperref[csl:15]{(Martin, 1996)}. However in this study, the emphasis
is not on assessing the performance or influence of a specific set of
papers, but rather to gauge the adoption of the field as a whole. As
technology diffusion models also rely on non-invested parties being made
aware of scientific and technological progress, communication and
promotion of scientific research are important factors to include in
adoption processes~\hyperref[csl:3]{(Bass, 2004)}. Adoption is equally dependent on
perceptions of current scientific and technological rates of
progress~(shaped by social and political pressures, as well as
technical), rather than the actual rates of progress (shaped by
technical contributions to knowledge). Lastly, diffusion effects are
population size, word-of-mouth, and time
dependent~\hyperref[csl:3]{(Bass, 2004)}.~As a result, measures of scientific
production are felt to be a more relevant~ indication of likelihood to
adopt than measures of scientific progress, although they could also
indicate a potentially contentious or controversial topic that is
generating lots of different opinions. However, controversy does not
necessarily prevent adoption, and in some cases may accelerate
substitution mechanisms~(missing citation). Consequently, for the
purposes of this study the scientific production associated with debate
over contentious or controversial technologies is not believed to
significantly skew the trends presented here in either direction away
from the intended simplified reflection of real-world adoption
characteristics.

\section{Methodology}

{\label{274383}}

\subsection{Statistical comparisons of time
series}

{\label{204737}}

This study considers 23 technologies where literature evidence has been
identified to classify the particular mode of technology substitution
observed. Using bibliometric analysis methods it is possible to extract
a variety of historical trends for any technologies of interest,
effectively generating a collection of time series data points
associated with a given technology (these multidimensional time series
datasets are referred to here as `technology profiles'). This raises the
question of how best to compare dissimilar bibliometric technology
profiles in an unbiased manner in order to investigate whether
literature based technology substitution groupings can be determined
using a classification system built on the assumptions given in
section~{\ref{771448}}. In particular comparisons of
technology time series can be subject to one or more areas of
dissimilarity: time series may be based on different number of
observations (e.g. covering different time spans), be out of phase with
each other, may be subject to long-term and shorter term cyclic trends,
be at different stages through the Technology Life Cycle (or be
fluctuating between different stages) \hyperref[csl:18]{(Little, 1981)}, or be
representative of dissimilar industries. As such, a body of work already
exists on the statistical comparison of time series, and in particular
time series classification methods~\hyperref[csl:19]{(Lin, Williamson, Borne, \& DeBarr, 2012)}. Most modern time
series pattern recognition and classification techniques emerging from
the machine learning and data science domains broadly fall within the
categories of supervised, semi-supervised, or unsupervised learning
approaches. The distinction between these categories is based on the
amount of training information provided to the classifier in each case.
In supervised learning, training time series are provided with known
classification labels, whilst training time series with both known and
unknown classification labels are used in semi-supervised learning. ~By
contrast, unsupervised learning approaches are not provided with any
classification labels, and as such are required to determine groupings
independently (e.g. clustering)~\hyperref[csl:19]{(Lin, Williamson, Borne, \& DeBarr, 2012)}.
Table~{\ref{table:time_series_pattern_recognition_techniques}}~below
provides an overview of time series pattern recognition techniques
commonly used (this list is not exhaustive):\selectlanguage{english}
\begin{table}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{8cm}|p{8cm}|p{8cm}}
    {Supervised learning} & {Semi-supervised learning} & {Unsupervised learning} \\ \midrule
    Memory-based reasoning (e.g. nearest neighbour -- often the standard approach) & Self-training algorithms such as semi-supervised 1-NN (nearest neighbour with leave-one-out cross validation) & Principal Components Analysis (variable reduction procedure) \\
    Decision trees & Efficient combinatorial approach based on Markov chains & Hierarchical clustering \\
    Rule induction & Transduction & K-Means/K-Medoids \\
    Bayesian networks & Time series discords & Expectation-Maximisation \\
    Support Vector Machines (SVMs) &  & Canonical Correlation Analysis \\
    Neural networks &  & Partial Least Squares \\
    Linear discriminant analysis &  & \\
\end{tabular}%}
%\caption{Common time series pattern recognition techniques \hyperref[csl:19]{(Lin, Williamson, Borne, \& DeBarr, 2012)}}
\caption{{Common time series pattern recognition techniques [Lin 2012]}}
\label{table:time_series_pattern_recognition_techniques}
\end{table}

\subsubsection{Preprocessing and statistical significance testing of
time series
classifications}

{\label{697753}}

Beyond the principal methods of classification outlined above, the
preprocessing of time series datasets and means of statistical
significance testing must also be considered. Preprocessing of data in
particular is still an area that divides opinion within the statistics
community, with some experts arguing that transformation, smoothing, and
normalisation of datasets is required for unbiased time series
comparisons, whilst others contend that in doing so a lot of information
is removed that could otherwise be captured in error terms and that
correlations may be over-stated~\hyperref[csl:20]{(Lucero \& Koenig, 2000}; \hyperref[csl:21]{Ramsay, Hooker, \& Graves, 2009}; \hyperref[csl:22]{\textit{{Smoothing Data, Filling Missing Data, and Nonparametric Fitting}}, n.d.}; \hyperref[csl:23]{\textit{{When and why do we need data normalization?}}, 2013}; \hyperref[csl:24]{\textit{{Smoothing - when to use it and when not to?}}, 2015)}. ~If focusing on
long-term trends it is often recommended that analysis is based on
either logarithms or inverse hyperbolic sine transformations of time
series data rather than raw data in order to reduce focus on short
cyclic features~\hyperref[csl:25]{(Ramsay, 2013}; \hyperref[csl:26]{Nau, n.d.}; \hyperref[csl:27]{Hyndman, 2010}; \hyperref[csl:28]{\textit{{Log transformation of values that include 0 (zero) for statistical analyses?}}, 2014)}. Similarly, simple moving averages
are thought to be more appropriate than exponential smoothing (for long
term trends if smoothing is to be applied)~\hyperref[csl:29]{(Twomey, n.d.)}.

A key data preparation requirement considered in this analysis relates
to the definition of shared curve features from bibliometric data that
can be used to address the time series and Technology Life Cycle
alignment issues highlighted in section~{\ref{204737}}.
These feature recognition and alignment processes are required to enable
fair comparisons and classification to be based on dissimilar
technologies. To ensure consistency, feature recognition processes
should consider the relative height of plateaus observed between
technology profiles from different industries, the rates of growth
observed in the early stages of historical trends, and the influence of
noise and incomplete time series data on the classifications being made.
For these reasons it is assumed that unsmoothed, amplitude normalised
time series which are subsequently segmented based on common curve
features would enable these comparisons to be made. This approach would
ensure that all curve amplitudes considered are relative on a global
scale, whilst segmentation based on common features would enable
consistency in defining early growth phases whilst allowing later
incomplete segments to be discarded from classifications. As a basis for
these feature extraction stages it is assumed that the Technology Life
Cycle model proposed by Little provides a well-established concept and a
sensible candidate for identification of common curve
features~\hyperref[csl:18]{(Little, 1981)}. However, identified curve features may
still be unaligned in time, and consequently time transformation
techniques, such as `time warping' methods, are also recommended (this
is discussed in more detail in section {\ref{446824}}).

In terms of being able to determine correlations between groups of time
series datasets the Chi-square statistic is commonly used to test the
independence of descriptive statistics derived from time series (time
series classifiers are discussed in more detail in section
{\ref{446824}}). However, as a consequence of the
probability distribution function used in its significance test the
Chi-squared approach is best suited to confusion matrices (i.e.
cross-tabulated comparisons of predicted classifications against target
classifications) which have all cell values being greater than or equal
to five. As such, when smaller sample sizes are considered (such as the
23 technologies considered in this analysis), Fisher's exact test is
more appropriate. In a similar fashion to the Chi-square test, Fisher's
exact test is able to determine the significance of outcomes for samples
taken at random from a population, but is not necessarily able to
provide a ranking of the most statistically robust predictors (i.e.
predictors that are likely to be accurate when considering out-of-sample
predictions). It is worth noting that in this analysis technologies have
been deliberately selected based on their observed performance trends,
and as such Fisher's exact test cannot be used to reject the null
hypothesis (as samples are not being taken at random from a
population)~unless known time series classification labels are removed
so that clustering is not based on human biases (i.e. unsupervised
learning approach).

For subsequent ranking of predictors based on small sample sizes,
cross-validation approaches are then required (discussed in more detail
in section {\ref{387734}}). Histograms can also prove
useful for determining the most frequently occurring individual factors
in these cross-validation `bootstrapping' processes, but cannot indicate
what combination of factors would work best together.

\subsubsection{Time series classification and feature alignment
techniques}

{\label{446824}}

In order to identify and rank the predictive ability of different
combinations of bibliometric indicators when used for classification
purposes, an appropriate classifier first has to be selected that fits
the data features being considered. In this sense time series
classification procedures can be grouped based on the type of
discriminatory features the techniques are attempting to
find~\hyperref[csl:30]{(Bagnall, Lines, Bostrom, Large, \& Keogh, 2016)}:\selectlanguage{english}
\begin{table}
\begin{tabular}{p{3cm}|p{15cm}}
    {Method type} & {Description} \\ \midrule
    Whole series & two time series compared either as a vector or by a distance measure that uses all the data \\
    \hline
    Intervals & rather than use whole series, select one or more phase dependent intervals of the series \\
    \hline
    Shapelets & based on finding short, phase independent, patterns (shapelets) that define class, but that can appear anywhere in series. Class is distinguished by presence or absence of one or more shapelets anywhere in whole series \\
    \hline
    Dictionary based & classification based on histograms constructed from frequency counts of recurring patterns \\
    \hline
    Combinations & class of algorithms based on combining two or more of the above approaches into a single classifier \\
    \hline
    Model based & Model based algorithms fit a generative model to each series then measure similarity between series using similarity between models. Commonly proposed for tasks other than classification or as part of a larger classification scheme and are often not as competitive as other approaches (except for long series of unequal length) \\
\end{tabular}
\caption{{Types of time series classification techniques [Bagnall 2016]}}
%\caption{Types of time series classification techniques \hyperref[csl:30]{(Bagnall, Lines, Bostrom, Large, \& Keogh, 2016)}}
\label{table:types_of_time_series_classification_techniques}
\end{table}

Recent benchmarking analysis has found that few time series
classification algorithms perform better than the Dynamic Time Warping
and Rotation Forest benchmark classifiers, whilst the best alternative
(COTE) was identified as being hugely computationally
expensive~\hyperref[csl:30]{(Bagnall, Lines, Bostrom, Large, \& Keogh, 2016)}. It's worth noting that feature alignment
techniques that calculate relative feature-based distance measures
between time series (such as whole series and interval approaches) can
be used to calculate single value representations of the similarity
between any given pair of time series, including complex time series
with multiple dimensions, which can subsequently be used in further
clustering or wider classification analysis.

In the case of Dynamic Time Warping feature alignment is achieved by
stretching portions of two signals,~\emph{X} and~\emph{Y}, onto a shared
set of instances such that a global signal-to-signal distance measure is
minimised. The set of distortion paths used in this minimisation problem
are based on a lattice of all possible distances between
the~\emph{m}\textsuperscript{th} data point of~\emph{X} and
the~\emph{n}\textsuperscript{th} data point of~\emph{Y}. Valid warping
paths, parameterised by two sequences of the same length, are a
combination of ``chess king'' moves which completely aligns the signal,
does not skip any data points, and does not repeat any signal
features.~In determining the path with minimum warping path the
algorithm forces similar features to appear at the same location on a
common time axis~\hyperref[csl:31]{(MathWorks, 2016)}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Dynamic-Time-Warping-of-CompactFluorescentLamps-relative-to-ElectricVehicles-(cited-references-by-priority-year)/Dynamic-Time-Warping-of-CompactFluorescentLamps-relative-to-ElectricVehicles-(cited-references-by-priority-year)}
\caption{{Example of feature alignment and Euclidean distance measurement using
Dynamic Time Warping on unaligned signals
{\label{592054}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Dynamic-Time-Warping-of-CompactFluorescentLamps-relative-to-ElectricVehicles-(ALL-INDICATORS)/Dynamic-Time-Warping-of-CompactFluorescentLamps-relative-to-ElectricVehicles-(ALL-INDICATORS)}
\caption{{Example of feature alignment and Euclidean distance measurement using
DTW on unaligned multi-dimensional signals
{\label{929856}}%
}}
\end{center}
\end{figure}

\subsubsection{Time series clustering
techniques}

{\label{177293}}

As a form of unsupervised learning, clustering approaches enable
associations between time series to be identified without being
subjected to human grouping biases. However, in order to apply
clustering techniques it is necessary to be able to describe the
relationships between successive pairs of time series using single value
representations. Consequently time series clustering techniques tend to
be based on measures of the relative distance between curves, rather
than the curve data points themselves. There is also considerable
variation in the outcomes depending on the clustering algorithm selected
for use. This can be in terms of the real-world interpretation of the
groupings generated, as observed when comparing clusters predicted using
the K-means and K-medoids algorithms.
Fig.~{\ref{943632}}~below illustrates how the centre of
subsets in K-means is equivalent to the mean of measurements in the
subset (the centroid), rather than an actual member of the subset (a
medoid). As such K-means is not appropriate for application to time
series, as the algorithm ends up minimising variance, rather than
distances between curves \hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{\textit{{Dynamic Time Warping Clustering}}, 2015)}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/K-Medoids-vs-K-Means-2/K-Medoids-vs-K-Means-2}
\caption{{Differences in real-world interpretations of K-means and K-medoids
clustering algorithms
{\label{943632}}%
}}
\end{center}
\end{figure}

Besides predicting alternative central points for subsets, and
consequently grouping alternative subset members, the number of clusters
predicted can also vary depending on the algorithm selected. Whereas
K-means and K-medoids require the number of clusters to be specified in
advance, hierarchical clustering approaches automatically determine the
number of clusters to group data points into without additional human
intervention~\hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{\textit{{Dynamic Time Warping Clustering}}, 2015)}. Furthermore, as a form of unsupervised
learning, clustering approaches will provide different group labels to
subsets each time they are applied, even if the actual subset members
remain unchanged, so a separate~`subset~mapping' function based on
`Hamming distance' is required to ensure consistency in comparisons
between generated clusters and expected groupings. Once again it is also
worth noting that the definition of subsets using any clustering
technique will only be valid if time series are being compared on
comparative features rather than incomplete time series data. As such,
time series segmentation based on shared features or imputation of
missing data are again prerequisites for meaningful analysis, ensuring
that only completed segments are used in defining subsets. Finally, if
using feature-based distance measures as the basis for clustering
(grouped into matrices of distance points relating each technology time
series to every other time series)~then it is generally suggested that
either hierarchical clustering or the `Partitioning Around Medoids'
(PAM) variant of K-Medoids are applied to the descriptive
data~\hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{\textit{{Dynamic Time Warping Clustering}}, 2015)}.

\subsubsection{\texorpdfstring{\emph{Distance measures that can be used
in~ clustering and feature
alignment:}}{Distance measures that can be used in~ clustering and feature alignment:}}

{\label{895376}}

%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{5cm}|p{13cm}}
    {Metric} & {Description} \\ \midrule
    'euclidean' & Euclidean distance (default). \\
    \hline
    'squaredeuclidean' & Squared Euclidean distance. (This option is provided for efficiency only. It does not satisfy the triangle inequality.) \\
    \hline
    'seuclidean' & Standardized Euclidean distance. Each coordinate difference between rows in X is scaled by dividing by the corresponding element of the standard deviation S=nanstd(X). To specify another value for S, use D = pdist(X,'seuclidean',S). \\
    \hline
    'cityblock' & City block metric. \\
    \hline
    'minkowski' & Minkowski distance. The default exponent is 2. To specify a different exponent, use D = pdist(X,'minkowski',P), where P is a scalar positive value of the exponent. \\
    \hline
    'chebychev' & Chebychev distance (maximum coordinate difference). \\
    \hline
    'mahalanobis' & Mahalanobis distance, using the sample covariance of X as computed by nancov. To compute the distance with a different covariance, use D = pdist(X,'mahalanobis',C), where the matrix C is symmetric and positive definite. \\
    \hline
    'cosine' & One minus the cosine of the included angle between points (treated as vectors). \\
    \hline
    'correlation' & One minus the sample correlation between points (treated as sequences of values). \\
    \hline
    'spearman' & One minus the sample Spearman's rank correlation between observations (treated as sequences of values). \\
    \hline
    'hamming' & Hamming distance, which is the percentage of coordinates that differ. \\
    \hline
    'jaccard' & One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ. \\
    \hline
    custom distance function & A distance function specified using @: D = pdist(X,@distfun). A distance function must be of form d2 = distfun(XI,XJ) taking as arguments a 1-by-n vector XI, corresponding to a single row of X, and an m2-by-n matrix XJ, corresponding to multiple rows of X. distfun must accept a matrix XJ with an arbitrary number of rows. distfun must return an m2-by-1 vector of distances d2, whose kth element is the distance between XI and XJ(k,:).

    \\
\end{tabular}%}

{[}\href{https://uk.mathworks.com/help/stats/pdist.html}{https://}\href{https://uk.mathworks.com/help/stats/pdist.html}{uk.mathworks.com/help/stats/pdist.html},~\href{https://numerics.mathdotnet.com/distance.html}{https://}\href{https://numerics.mathdotnet.com/distance.html}{numerics.mathdotnet.com/distance.html}{]}

\subsubsection{Cross-validation
techniques}

{\label{387734}}

To assess the predictive performance of any given combination of
bibliometric indicators in practice it is necessary to determine how
the~classification results will generalise to an independent (i.e.
unknown) data set. For this purpose, cross-validation techniques are
commonly employed to provide an indication of model validity when
considering out-of-sample predictions. This is accomplished by
sequentially~training and then generating test predictions from
different subset decompositions of the original data, and using the
average number of misclassified observations as a means to rank each
predictor grouping. In doing so, cross-validation helps to address the
risk of over-fitting models that are based on limited sample sizes, but
equally provides a means to identify the most suitable predictor
groupings to use for model building purposes based on their robustness
to misclassifications. Cross-validation techniques are generally grouped
into either exhaustive or non-exhaustive categories, as shown in Table
{\ref{table:cross-validation_techniques}}:\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{8cm}|p{8cm}}
    {Exhaustive cross-validation approaches} & {Non-exhaustive cross-validation approaches} \\ \midrule
    \hline
    Leave-p-out cross-validation & k-fold cross-validation \\
    \hline
    Leave-one-out cross-validation (most computationally inexpensive version of leave-p-out cross-validation) & Holdout method \\
    \hline
     & Monte Carlo (repeated random sub-sampling) \\
\end{tabular}%}
\caption{{Common cross-validation techniques}}
\label{table:cross-validation_techniques}
\end{table*}

Some known limitations have to be taken into consideration when applying
cross-validation techniques. In particular, cross-validation
approaches~only yield meaningful results if the validation set and
training set used are drawn from the same population (without overlap
between sets), and if human biases are controlled. For example, it is
unrealistic to treat data as being drawn from the same population when
using dissimilar time periods for validation and training sets, as this
shift in time will introduce systematic differences into the sets being
considered. As such, alignment of features to ensure consistency is
again advisable for fair comparisons of time series. Similarly, training
models based on a specific group of a population~(e.g. young people),
does not enable generalisation of cross-validated training results to
the wider population as predictions could differ greatly to actual
results.

\subsubsection{Functional data analysis}

{\label{875755}}

Most statistical analysis techniques assume that the data points being
evaluated are unrelated, and can be treated as independent entities.
This is not generally true of time series, where there is often a
derivative function that connects adjoining data points together. To
address these scenarios, functional data analysis approaches were
developed to enable statistical analysis and model construction based on
whole functions rather than a collection of independent data points,
making these approaches well suited to time series
data~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}. Additionally, functional data analytics has
proved to be suitable for conditions where phase variations are present
in data (such as in growth data and historical trends where curves start
at different times/stages). Methods such as nonlinear mixed models,
repeated measure ANOVA, and principal components analysis do not
consider these differences in timing~\hyperref[csl:34]{(\textit{{When/where to use functional data analysis?}}, 2012)}.

Functional data approaches are built on the principal of using `basis
functions' to represent data series as a `functional data
object'~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}. Basis functions are defined by:

\begin{equation}
\label{eq:basis_function_1}
    f\left(t\right)=\Sigma \beta _ib_i\left(t\right)
\end{equation}

where~\(b_i\left(t\right)\)~are known values, and~\(\beta_i\) are
the estimated coefficients. This is often also written as:

\begin{equation}
\label{eq:basis_function_2}
    f\left(t\right)=a_1\theta _1\left(t\right)+a_2\theta _2\left(t\right)+...+a_k\theta _k\left(t\right)
\end{equation}

Functional data objects can subsequently be used in functional linear
regression analysis, in an analogous way to conventional linear
regression:

\begin{equation}
\label{eq:sedov}
    y = X \beta + \varepsilon
\end{equation}
where
\[
    y = \begin{pmatrix}
            y_1 \\
            y_2 \\
            \vdots \\
            y_n
        \end{pmatrix},
    X = \begin{pmatrix}
            x_1^T \\
            x_2^T \\
            \vdots \\
            x_n^T
        \end{pmatrix}
    =   \begin{pmatrix}
            \begin{bmatrix}
                x_{11} \cdots x_{1p} \\
                x_{21} \cdots x_{2p} \\
                \vdots \ddots \vdots \\
                x_{n1} \cdots x_{np} \\
            \end{bmatrix}
        \end{pmatrix},
    \beta = \begin{pmatrix}
                \beta_1 \\
                \beta_2 \\
                \vdots \\
                \beta_p
            \end{pmatrix},
    \varepsilon =   \begin{pmatrix}
                        \varepsilon_1 \\
                        \varepsilon_2 \\
                        \vdots \\
                        \varepsilon_n
                    \end{pmatrix}
\]

The exact definition of basis functions used in functional data objects
depends closely on the type of data or feature that functional data
objects are looking to replicate. At their most basic level Fourier
series are commonly used for periodic and near periodic data (such as
for weather data and some economic data), whilst spline-based functions
are used for non-periodic
data~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. Beyond these higher level distinctions, polynomial, B-spline
(which are essentially built up of many polynomial sections), and
wavelet functions can also be considered, with B-splines found to be
better suited to fitting highly curvy data (where polynomials would
require a large number of basis functions to achieve the same degree of
fit - as such splines have largely replaced polynomials now). Wavelets
have been observed to be very good at capturing sharp edges, which is a
particular weakness of Fourier based
functions~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. If using B-splines, it is necessary to first define the number
of `knots' that should be used in the representation of a curve (i.e.
the joining points linking adjacent polynomial segments in the spline).
Setting the number of knots equivalent to the total number of
observations in a time series keeps this definition simple, although may
again result in a large number of basis functions depending on the
length of the series considered.

With regards to best implementation practices for functional data
analysis, recommendations have been presented in the work of Ramsay that
should be considered if looking to apply these techniques. Firstly, it
is advised that the order of B-spline functions be at least four orders
of magnitude larger than the highest order derivative to be considered
in any analysis, in order to properly capture any significant influences
from derivative
behaviours~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. Another important point raised in this literature is the need to
scale time vectors appropriately as required so that the time period of
each basis function is not significantly less than 1, otherwise rounding
errors can become an issue when large number of basis functions are
used~\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. It's also worth noting at this point that the analysis that
follows assumes that the resampling of time series based on simple
linear interpolation in order to ensure that a consistent number of
observations is used across technologies being compared will not
introduce significant errors into the assessment of the predictive
ability of different bibliometric indicator groups. In terms of
compatibility with feature alignment techniques, the work of Ramsay
provides well-documented evidence from the studies conducted previously
of how feature alignment processes (also referred to as `landmark
registration') often form a prerequisite to model building using
functional data approaches. As such, time series segmented and aligned
based on features, such as aligning technologies against common
Technology Life Cycle stages, have been shown to enable a single data
object to be generated for multiple curves that originally spanned
across time periods of different lengths
\href{https://www.authorea.com/users/161287/articles/181390-identifying-the-mode-and-impact-of-disruptive-innovations\#Ramsay_2009}{(Ramsay
2009)}. Lastly, in applying functional data analysis techniques to other
examples of growth curves (such as the U.S. Nondurable Goods Index),
Ramsay advocates the use of data transformation and smoothing in order
to be able to focus on long-term trends rather than periodic or seasonal
patterns~\hyperref[csl:25]{(Ramsay, 2013}; \hyperref[csl:35]{Ramsay, 2013)}.

\emph{\textbf{Example of functional basis systems and data objects}}

\emph{Illustration of constant and monomial basis systems:}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Constant-basis-system-for-functional-regression-analysis/Constant-basis-system-for-functional-regression-analysis}
\caption{{This is a caption
{\label{111002}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Monomial-basis-system-for-functional-regression-analysis/Monomial-basis-system-for-functional-regression-analysis}
\caption{{This is a caption
{\label{741971}}%
}}
\end{center}
\end{figure}

\emph{Illustration~ of b-spline basis system with 54 basis functions and
corresponding functional~ data object:}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Functional-basis-system-for-CompactFluorescentLamps-based-on-corporates-by-priority-year/Functional-basis-system-for-CompactFluorescentLamps-based-on-corporates-by-priority-year}
\caption{{This is a caption
{\label{537153}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Functional-Data-Object-for-CompactFluorescentLamps-based-on-corporates-by-priority-year/Functional-Data-Object-for-CompactFluorescentLamps-based-on-corporates-by-priority-year}
\caption{{This is a caption
{\label{145267}}%
}}
\end{center}
\end{figure}

\subsection{Method selection}

{\label{655650}}

Based on the technology classification problem considered, the
bibliometric data available, and the methods discussed in
sections~{\ref{204737}}~to~{\ref{875755}}~the
following methods have been selected for use in this analysis:

\subsubsection{Technology Life Cycle stage matching
process}

{\label{931418}}

For those technologies where evidence for determining the transitions
between different stages of the Technology Life Cycle has either not
been found or is~incomplete, a nearest neighbour pattern recognition
approach has been employed based on the work of
Gao~\hyperref[csl:36]{(Gao et al., 2013)}~to locate the points where shifts between cycle
stages occur.

\emph{In this instance a supervised learning approach is taken as the
well-established nature of the Technology Life Cycle model is widely
recognised to form a sensible basis for classifying technological
maturity, so there is no need to establish the validity of the
categories being assigned. Equally, the nearest neighbour approach is
commonly used as an industry standard, so no further development is
proposed here for this study.}

\textbf{OR}

\emph{However, for the technologies considered in this paper, literature
evidence has been identified for the transitions between stages, and so
the nearest neighbour methodology is not discussed further here.}

\subsubsection{Identification of significant patent indicator
groups}

{\label{218955}}

In order to identify those bibliometric indicator groupings that could
form the basis of a data-driven technology classification model a
combination of Dynamic Time Warping and the `PAM' variant of K-Medoids
clustering has been applied in this study. For the initial feature
alignment and distance measurement stages of this process, Dynamic Time
Warping is still widely recognised as the classification benchmark to
beat (see section~{\ref{446824}}), and so this study
does not look to advance the feature alignment processes used beyond
this. Unlike the Technology Life Cycle stage matching process which is
based on a well-established technology maturity model, this study is
assuming that a classification system based on the modes of substitution
outlined in section~{\ref{771448}} is not intrinsically
valid. For this reason an unsupervised learning approach has been
adopted here to enable human biases to be eliminated in determining
whether a classification system based on presumptive technological
substitution is valid or not, before subsequently defining a
classification rule system. In doing so this additionally means that
labelling of predicted clusters can be carried out even if labels are
only available for a small number of observed samples representative of
the desired classes, or potentially even if none of the observed samples
are absolutely defined. This is of particular use if this technique is
to be expanded to a wider population of technologies, as obtaining
evidence of the applicable mode of substitution that gave rise to the
current technology can be a time-consuming process, and in some cases
the necessary evidence may not be publicly available (i.e. if dealing
with commercially sensitive performance data). As such, clustering can
provide an indication of the likely substitution mode of a given
technology without the need for prior training on technologies that
belong to any given class. Under such circumstances this approach could
be applied without the need for collecting performance data, providing
that the groupings produced by the analysis are broadly identifiable
from inspection as being associated with the suspected modes of
substitution (this is of course made easier if a handful of examples are
known, but means that this is no longer a hard requirement). The `PAM'
variant of K-Medoids is selected here over Hierarchical clustering since
the expected number of clusters is known from the literature, and
keeping the number of clusters fixed allows for easier testing of how
frequently predicted clusters align with expected groupings.
Additionally, a small sample of technologies is evaluated in this study,
and as a result computational expense is not likely to be significant in
using the `PAM' variant of K-Medoids ~over Hierarchical clustering
approaches. It's also worth noting that by evaluating the predictive
performance of each subset of patent indicator groupings independently
it is possible to spot and rank commonly recurring patterns of subsets,
which is not possible when using approaches such as Linear Discriminant
Analysis which can assess the impact of individual predictors, but not
rank the most suitable combinations of indicators.

\subsubsection{Ranking of significant patent indicator
groups}

{\label{814495}}

As the number of technologies considered in this study is relatively
small, exhaustive cross-validation approaches provide a feasible means
to rank the out-of-sample predictive capabilities of those bibliometric
indicator subsets that have been identified as producing significant
correlations to expected in-sample technology groupings. As such,
leave-p-out cross-validation approaches are applied for this purpose,
whilst also reducing the risk of over-fitting in the following model
building phases.

\subsubsection{Model building}

{\label{260337}}

Due to the importance of phase variance when comparing historical trends
for different technologies, and the coupling that exists between
adjacent points in growth and adoption curves, functional linear
regression is selected here to build the technology classification model
developed in this study (see section~{\ref{875755}}).

\subsubsection{Sensitivity of technology adoption to chosen modelling
parameters}

{\label{806618}}

Whilst statistical approaches are well-suited to detecting underlying
correlations in historical and experimental datasets, this on it's own
does not provide a detailed understanding of the causation behind
associated events. Equally, statistical methods are not generally well
suited to predicting disruptive events and complex interactions, with
other simulation techniques such as System Dynamics and Agent Based
Modelling performing better in these areas. Accordingly, in order to
identify causation effects and test the sensitivity of technological
substitution patterns to variability arising from real-world
socio-technical features not captured in simple bibliometric indicators
(such as the influence of competition and economic effects), the fitted
regression model is evaluated in a real-time system dynamics
environment.

\subsection{Method limitations}

{\label{518077}}

Although precautions have been taken where available to ensure that the
methods selected for this study address the problem posed of building a
generalised technology classification model based on bibliometric data
in as rigorous a fashion as possible, there are some known limitations
to the methods used in this work that must be recognised. Many of the
current limitations stem from the fact that in this analysis
technologies have been selected based on where evidence is obtainable to
indicate the mode of adoption followed. As such the technologies
considered here do not come from a truly representative cross-section of
all industries, so it is possible that models generated will provide a
better representation of those industries considered rather than a more
generalisable result. This evidence-based approach also means that it is
still currently a time-consuming process to locate the necessary
literature material to be able to support classifying technology
examples as arising based on one mode of substitution or another, and to
then compile the relevant cleaned patent datasets for analysis. As a
result only a relatively limited number of technologies have been
considered in this study, which should be expanded on to increase
confidence in the findings produced from this work. This also raises the
risk that~clustering techniques may struggle to produce consistent
results based on the small number of technologies considered.
Furthermore, any statistical or quantitative methods used for modelling
are unlikely to provide real depth of knowledge beyond the detection of
correlations behind patent trends when used in isolation. Ultimately
some degree of causal exploration, whether through case study
descriptions, system dynamics modelling, or expert elicitation will be
required to shed more light on the underlying influences shaping
technology substitution behaviours. Other data-specific issues that
could arise relate to the use of patent searches in this analysis and
the need to resample data based on variable length time series. The
former relates to the fact that patent search results and records can
vary to a large extent based on the database and exact search terms
used, however overall trends once normalised should remain consistent
with other studies of this nature (this point is addressed in more
detail in~\emph{\textbf{section XX}}). The latter meanwhile refers to
the fact that functional linear regression requires all technology case
studies to be based on the same number of time samples, and as such, as
discussed in section {\ref{875755}}, linear
interpolation is used as required to ensure consistency on the number of
observations whilst possibly introducing some small errors which are not
felt to be significant.

\subsection{Selected data sources}

{\label{521682}}

Three types of data sources are considered in this study, relating to
either patent or publication data (i.e. bibliometric sources), which are
subsequently coupled with technology adoption data to enable the impact
of different modes of substitution to be investigated:

\subsubsection{Patent data}

Patent data has been sourced from the Questel-Orbit patent search
platform in this analysis. More specifically, the full FamPat database
was queried in this study, which groups related invention-based patents
filed in multiple international jurisdictions into families of
patents.~This platform is accessed by subscribers via an online search
engine that allows complex patent record searches to be structured,
saved, and exported in a variety of formats. A selection of keywords,
dates, or classification categories are used in this search engine to
build relevant queries for a given technology (this process is discussed
in more detail in~section {\ref{335937}}). The provided
search terms are then matched in the title, abstract, and key content of
all family members included in a FamPat record, although unlike title
and abstract searches, key contents searches (which include independent
claims, advantages, drawbacks, and the main patent object) are limited
to only English language publications. Some of the core functionalities
behind this search engine are outlined in \hyperref[csl:37]{(Lambert, 2000)}.

\subsubsection{Publication data}

Journal article and publication records used in this analysis are based
on extracted search results from the Web of Science (WoS) citation
indexing service provided by Clarivate Analytics (previously Thomson
Reuters). Web of Science was originally established based on the work of
Eugene Garfield, who identified the relevance of citations and
subsequently developed the idea of the Science Citation Index (SCI) in
the 1950's as a database for storing these records, along with the
Institute for Scientific Information (ISI) as an organisation setup to
maintain this information. Whilst not originally intended for research
evaluation, but rather for aiding researcher's in finding relevant work
more effectively, the SCI was later joined by the Social Sciences
Citation Index (SSCI), and subsequently the Arts \& Humanties Citation
Index (A\&HCI) in the 1970's. After being acquired by the Thomson
Corporation, this collection of indexes was converted into the present
day Web of Science, which is currently reported to hold details of over
100 million records dating from 1900 onwards, covering more than 33,000
journals, 50,000 books, and 160,000 conference proceedings. As such,
this comprises the largest collection of scholarly articles
globally~\hyperref[csl:38]{(Mingers \& Leydesdorff, 2015}; \hyperref[csl:39]{Analytics, 2017)}.

In a very similar fashion to the Questel-Orbit platform, the online Web
of Science search engine relies on a series of keywords and Boolean
operators to define search terms that are then matched in the title,
abstract, and key content of the records in the database.

\subsubsection{Technology adoption data}

Adoption data for the technologies investigated is taken from a wide
variety of sources due to the broad scope of the technology domains
considered. Where possible, global technology sales and shipment values
have been used to determine the overall market share of each technology
at a given time, although in some cases data values have been imputed to
fill gaps in time series (this is stated where this has been applied, as
well as the method of deriving imputed values). Furthermore, the
preference has been to extract statistical data directly from
international agencies such as the UN, World Bank, International Energy
Agency, International Council on Clean Transportation, International
Telecommunication Union, and Eurostat when available, as these
organisations generally present the most consistent representation of
the technologies considered when taking into consideration regional
development trends. In many cases, this information was accessed via the
UK Data Service \hyperref[csl:40]{(Service, 2017)}.

A brief description of each data source used for technology adoption
data is given in Table
{\ref{table:data_sources_for_technology_adoption_data}}:\selectlanguage{english}
\begin{table}
\begin{tabular}{p{6cm}|p{14cm}}
    {Technology adoption data source} & {Description} \\ \midrule
    Ascend Fleets & Ascend Fleets, provided by Flight Global, is a subscriber based database that stores real-time aircraft and commercial aviation data on both currently operating and historical fleets. This online database comprises of over 240,000 aircraft records, including comprehensive transaction and status data on commercial, business, and helicopter operations \hyperref[csl:41]{(Global, 2017)} \\
    \hline
    BIS Strategic Decisions & BIS Strategic Decisions was the third largest provider of information to vendors in the information technology industry up until 1995 when it was acquired by the Giga Information Group. Up until this point BIS Strategic Decisions kept a tracker of the U.S. printer market shipments which were reported annually in 'PC Magazine' (NB: full details of assumptions and imputations made during the compilation of observed printer market share values are recorded in file attachments linked to the online version of this paper) \\
    \hline
    Eurostat & Eurostat is a Directorate-General of the European Commission that provides statistical information to the institutions of the European Union, and that records historical data for all major forms of transportation in the EU, including details of annual vehicle registrations \hyperref[csl:42]{(Eurostat, 2017)} \\
    \hline
    General Aviation Manufacturers Association (GAMA) & GAMA is an aviation industry trade association representing general aviation which produces the annual 'General Aviation Statistical Yearbook and Industry Outlook' \hyperref[csl:43]{(Association, 2016)} \\
    \hline
    International Council on Clean Transportation (ICCT) & The ICCT is an independent non-profit organisation that produces the 'European Vehicle Market Statistics pocketbook' that provides an annually updated summary of the passenger car and light vehicle fleets operated in the European Union, with an emphasis on vehicle technologies and the emissions of greenhouse gases and other pollutants \hyperref[csl:44]{(on Clean Transportation, 2016)} \\
    \hline
    International Data Corporation (IDC) & IDC is a Chinese market research and analysis firm that specialises in information technology, and which publishes an annual hardcopy peripheral tracker charting the historical trends observed in the global printer markets (NB: full details of assumptions and imputations made during the compilation of observed printer market share values are recorded in file attachments linked to the online version of this paper) \hyperref[csl:45]{(Corporation, n.d.)} \\
    \hline
    International Energy Agency (IEA) & The IEA is an intergovernmental organisation established as part of the Organisation for Economic Co-Operation and Development (OECD) in 1974, as a policy advisor to its member and non-member states that also serves as an information source on energy statistics. Numerous statistical datasets are available from the IEA, including global energy demand and production, global proliferation of renewable energy technologies, and domestic energy consumption patterns \hyperref[csl:46]{(\textit{{World Energy Outlook 2016}}, 2016}; \hyperref[csl:47]{4E, 2014)} \\
    \hline
    International Telecommunications Union (ITU) & The ITU is a specialised agency of the United Nations that is responsible for issues that concern information and communication technologies, and publishes statistics and reports annually on global telecommunications coverage \hyperref[csl:48]{(``{Measuring the Information Society (11 October 2012) - {ITU}}'', 2016}; \hyperref[csl:49]{Union, n.d.)} \\
    \hline
    IT Candor & IT Candor are a market research company working in the IT and Communications industry that produce annual statistics and forecasts for a range of small, medium, and large hardware providers, including trackers of worldwide printer shipments \hyperref[csl:50]{(Candor, 2016}; \hyperref[csl:51]{Candor, n.d.)} \\
    \hline
    World Bank & The World Bank is an international financial institution that has the stated goal of reducing worldwide poverty through promoting foreign investment and international trade. The World Bank collects and processes large amounts of data based on economic models and regularly publishes global development indicators in an open access format \hyperref[csl:52]{(\textit{{World Development Indicators 2016}}, 2016}; \hyperref[csl:53]{\textit{{World Development Indicators 2000}}, 2000}; \hyperref[csl:54]{Bank, 2017)} \\
\end{tabular}
\caption{{Data sources for technology adoption data}}
\label{table:data_sources_for_technology_adoption_data}
\end{table}

\section{Building a technology classification model from Technology Life
Cycle
features}

{\label{702306}}

\subsection{Technology adoption data collection and
aggregation}

{\label{985492}}

TBD\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Fig11/Fig11}
\caption{{This is a caption
{\label{757897}}%
}}
\end{center}
\end{figure}

\subsection{Patent indicator
definitions}

{\label{971055}}

The work of Gao identifies a range of studies that have been conducted
previously based on the principle of using either a single or multiple
bibliometric indicators as a means of investigating technological
development and performance~\hyperref[csl:36]{(Gao et al., 2013)}. Their review of these
methods concluded that multiple patent indicators are required to avoid
generating potentially unreliable results if just using a single
indicator extracted from patent data. As such, the nearest neighbour
classification process developed in Gao's study proposes the use of
thirteen separate patent indicators. This current study has accordingly
reproduced these metrics were possible, resulting in a total of ten
patent indicators (i.e. producing time series for each technology with
ten dimensions), as three of the previous list of indicators were
specific to the Derwent Innovation Index ~(missing citation)~which was
not used in this study due to the limited ability to bulk export the
necessary results from this database. As such,
Table~{\ref{table:bibliometric_indicators}} below
summarises the bibliometric indicators extracted for each technology
within this analysis:\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{2cm}|p{3cm}|p{11cm}}
    {Indicator No.} & {Name} & {Description} \\ \midrule
    1 & Application & Number of patents in Questel-Orbit by application year \\
    \hline
    2 & Priority & Number of patents in Questel-Orbit by priority year \\
    \hline
    3 & Corporate & Number of corporates in Questel-Orbit by priority year \\
    \hline
    4 & Non-corporate & Number of non-corporates in Questel-Orbit by priority year \\
    \hline
    5 & Inventor & Number of groups of inventors in Questel-Orbit by priority year \\
    \hline
    6 & Literature citation & Number of backward citations to literature in Questel-Orbit by priority year \\
    \hline
    7 & Patent citation & Number of backward citations to patents in Questel-Orbit by priority year \\
    \hline
    8 & IPC & Number of IPCs (4-digit) in Questel-Orbit by priority year \\
    \hline
    9 & IPC top 5 & Number of patents of top 5 IPCs in Questel-Orbit by priority year \\
    \hline
    10 & IPC top 10 & Number of patents of top 10 IPCs in Questel-Orbit by priority year \\
\end{tabular}%}
\caption{{Bibliometric indicators used in this study (based on the work of Gao [Gao 2013])}}
% \caption{Bibliometric indicators used in this study (based on the work of Gao \hyperref[csl:36]{(Gao et al., 2013)}}
\label{table:bibliometric_indicators}
\end{table*}

With the main exception of the use of the Questel-Orbit FamPat database
instead of the Derwent Innovation Index, the indicator definitions and
assumptions used in this study are otherwise consistent with those
outlined in sections 2.1.1 to 2.1.5
of~\href{https://www.authorea.com/users/161287/articles/182044-identifying-the-mode-and-impact-of-disruptive-innovations-journal-paper\#Gao_2013}{(Gao
2013)}. The only other notable difference to be recorded is that the
Questel-Orbit patent records are not automatically given a designation
as being a corporate, non-corporate, on individual patent assignee. As
such, the counts of corporate and non-corporate indicators (which would
otherwise be based on this assignee designation) are determined instead
based on the `Family Normalized Assignee Name' field available in the
patent records, as records with entries in this field correspond to
corporate designations.

\subsection{Search strategy and terms for identifying relevant patent
profiles}

{\label{335937}}

Previous bibliometric studies have explored the many different ways in
which patent records can be correctly identified for a given field or
topic~\hyperref[csl:56]{(Verbeek, Debackere, Luwel, \& Zimmermann, 2002}; \hyperref[csl:57]{Schmoch, 1997}; \hyperref[csl:58]{Albino, Ardito, Dangelico, \& Petruzzelli, 2014}; \hyperref[csl:59]{Rizzi, van Eck, \& Frey, 2014}; \hyperref[csl:60]{Mao, Liu, Du, Zuo, \& Wang, 2015}; \hyperref[csl:61]{Dong, Xu, Luo, Cai, \& Gao, 2012}; \hyperref[csl:62]{WIPO, 2009}; \hyperref[csl:63]{Helm, Tannock, \& Iliev, 2014)}. Whilst filtering of search results based on
technology classification categories is generally preferred where
possible to ensure a more rigorous search strategy~\hyperref[csl:58]{(Albino, Ardito, Dangelico, \& Petruzzelli, 2014)},
it is also advisable to keep the steps that supplement or remove patents
from searches queries to a minimum to maintain data consistency and
repeatability~\hyperref[csl:63]{(Helm, Tannock, \& Iliev, 2014)}. As such, the search queries used in
this analysis are based primarily on filtering by International Patent
Classification (IPC) or Cooperative Patent Classification (CPC) labels.
Where possible the IPC categories applied have been reused from previous
studies in order to replicate existing search queries so as to extract
comparative datasets, or have been based on expert defined groupings
such as the European Patent Office's Y02 classification which
specifically relates to climate change mitigation technologies.
Otherwise keyword search terms and IPC labels are combined that focus on
the appearance of closely adjoining instances of the search terms (or of
their common synonyms) to be matched. The use of IPC technology category
filters in this manner ensures that a higher level of relevance and
repeatability is achieved. Based on these preprocessing steps, the final
search queries used for the technologies to be considered are presented
in Table~{\ref{table:search_terms}}.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{4cm}|p{6cm}|p{7cm}|p{4cm}}
    {Case study} & {Orbit patent search keywords} & {IPC or CPC categories} & {No. of patent families} \\ \midrule
    Compact Fluorescent Lamp & (compact+ or CFL+ or (energ+ s (sav+ or low+))) AND fluores+ & CPC: Y02B-020/16+ OR Y02B-020/18+ OR Y02B-020/19+ & 1,169 (21/07/2017) \\
    \hline
    Electric vehicles & -- & CPC: Y02T-010/62+ OR Y02T-010/64+ OR Y02T-010/70+ OR Y02T-010/72+ OR Y02T-090/1+ & 100,870 (24/07/2017) \\
    \hline
    Fiber optics (data transfer) & ((fiber+ or fibre+) 3d optic+) & IPC: G02B OR H04B OR C03B OR C03C OR D01C OR D04H OR D06L OR G02F OR G06E OR G06K OR G11B OR G11C OR H02G OR H03K OR H04J OR H04N OR G01P & 176,299 (20/07/2017) \\
    \hline
    Geothermal electricity & -- & CPC: Y02E-010/1+ & 5,272 (24/07/2017) \\
    \hline
    Halogen lights & -- & CPC: Y02B-020/12+ & 645 (24/07/2017) \\
    \hline
    Hydro electricity & -- & CPC: Y02E-010/2+ & 46,125 (24/07/2017) \\
    \hline
    Impact/Dot-matrix printers & ((impact+ or (dot+ or matri+) or (daisy 1w wheel+)) 3d print+) & IPC: G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 24,993 (24/07/2017) \\
    \hline
    Incandescent lights & Incandescen+ or filament+ & IPC: F21H OR F21L OR F21S OR F21V OR F21W OR F21Y & 17,597 (03/08/2017) \\
    \hline
    Ink jet printer & (ink+ 3d jet+ 3d print+) & IPC: B41J-002/01 OR G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 46,135 (24/07/2017) \\
    \hline
    Internet & (internet+ 3d protocol+ 3d suite+) OR (computer+ 1w network+) & IPC: G06F OR H04L OR G06N OR H04K OR G09F & 42,861 (24/07/2017) \\
    \hline
    Landline telephones & (((land\_line+ or main\_line+ or home or fixed\_line+ or wire\_line+) 3d (+phone)) OR (speaking telegraph+) OR (telephon+)) NOT (mobil+ or (cell+ 3d (+phon+ or communi+)) or smart\_phon+ or port+) & IPC: H04B OR H01Q OR H01P OR H04J OR G01R OR H04Q OR H01H OR H04M OR H04R OR G10L & 139,895 (03/08/2017) \\
    \hline
    Laser printer & (laser+ 3d print+) & IPC: G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 17,827 (24/07/2017) \\
    \hline
    LED lights & -- & CPC: Y02B-020/3+ & 8,596 (24/07/2017) \\
    \hline
    Linear Fluorescent Tube lights & ((fluores+ 3d (lamp+ or light+ or tube+))) NOT (compact or (energ+ 3d sav+)) & IPC: F21K OR F21L OR F21S OR F21V OR F21W OR F21Y & 25,126 (24/07/2017) \\
    \hline
    Nuclear energy & -- & CPC: Y02E-030+ & 60,017 (24/07/2017) \\
    \hline
    Solar PV & -- & CPC: Y02E-010/5+ OR Y02E-010/6+ & 112,068 (24/07/2017) \\
    \hline
    Solar thermal electricity & -- & CPC: Y02E-010/4+ OR Y02E-010/6+ & 91,553 (24/07/2017) \\
    \hline
    TFT-LCD & ((((thin film+) 1w transistor+) or TFT+) AND (((liquid crystal+) 1w display+) or LCD)) or TFT\_LCD & IPC: G02F-001/13 & 5,181 (24/07/2017) \\
    \hline
    Thermal printers & (thermal+ 2d print+) & IPC: G03G OR B41J OR G06F OR G06K OR H04N OR G06T OR G02B OR H04L OR G01R OR G03C OR B41M OR G03B OR B65H & 23,388 (24/07/2017) \\
    \hline
    Tide-wave-ocean electricity & -- & CPC: Y02E-010/28+ OR Y02E-010/3+ & 19,224 (24/07/2017) \\
    \hline
    Turbojet & ((Gas w turbin+) or (jet+ w engine+) or turbo\_fan+ or turbo\_prop+ or turbo\_jet+ or turbo\_shaft+ or prop\_fan+ or ((open w rotor+) 3d (engine+ or technolog+ or counter\_rotat+))) & IPC: B60K OR B60L OR B60P OR B60V OR B61B OR B61C OR B62D OR B63B OR B63H OR B64C OR B64D OR B64F OR B64G OR F01D OR F02B OR F02C OR F02K & 71,024 (24/07/2017) \\
    \hline
    Wind electricity & -- & CPC: Y02E-010/7+ & 67,035 (24/07/2017) \\
    \hline
    Wireless data transfer & (Wireless 3d data 3d trans+) & IPC: H03K OR H04H OR H04W OR G06K OR G06T & 17,188 (24/07/2017) \\
\end{tabular}%}
\caption{{Patent data search terms}}
\label{table:search_terms}
\end{table*}

\subsection{Patent indicator data extraction
process}

{\label{467781}}

Using the technology classification categories, and where applicable,
the keywords specified in
Table~{\ref{table:search_terms}} the results of these
search queries were exported in batches of up to 10,000 records at a
time in a tabulated HTML format. Exported records were based on only the
representative family member for a given FamPat grouping in order to
avoid duplication of records across multiple jurisdictions.
Additionally, each exported record included the key patent information
along with full details of both cited patent and non-patent literature
references made in the current record. As some searches could generate
very large numbers of records (i.e. hundreds of thousands), the use of
batch processing enabled large quantities of records to be handled in
manageable formats, but required that the batches were subsequently
imported into a tool capable of processing the volumes of data
considered. For this purpose, MATLAB was used, and a script
(\textbf{\emph{provided in Appendix XX}}) was developed to convert each
HTML batch file into a corresponding .MAT file (based on a pre-existing
conversion script), ready for data cleaning processes.

\subsection{Patent indicator data cleaning
process}

{\label{918099}}

Whilst the consistency of the Questel-Orbit patent data is of a high
standard, several steps are still required to be able to extract patent
indicator metrics from this data (\textbf{\emph{these steps are outlined
in detail in Appendix XX}}). To begin with these include removing any
non-breaking spaces values in tabulated cells (which would interfere
with later counts of citations), removing any leading and trailing white
spaces from column headings, and translating column headings into a
format recognisable by MATLAB (i.e. replacing any spaces, slashes,
brackets, full stops or hyphens with underscores). Individual column
headings are then used to define variables in a MATLAB table, whilst a
generic `counter' variable is appended to the end of the table to enable
the subsequent construction of pivot tables. Having transformed the data
into a recognised table format, the script (\textbf{\emph{provided in
Appendix XX}}) then identifies entries that already have valid date
entries provided against them. Unique corporation, non-corporation, and
inventor IDs are then appended to the tabulated records based on
determining the similarity that exists between entries observed in these
three fields. Next, the script cycles through each record individually
and extracts the application and priority dates, where present, for each
patent family (this involves scanning through all priority dates~in the
case where multiple priority dates are listed against a single record
and identifying the earliest date). At this point, the script also
counts the number of references and the number of patents cited against
each individual record, whilst also mapping all references to any
included IPC categories to the correct IPC count tally (based
on~\hyperref[csl:64]{(Organization, 2015)}). This enables the number of distinct IPC
subclasses recorded for a given patent family to be counted, and is used
later when recombined with the corresponding tallies from every other
patent family record to rank the top 5 and top 10 most heavily
associated IPC subclasses with a developing technology for each year.
For those records where valid dates were not located in the previous
steps (typically in less than 5\% of records), the script then checks to
see if any other date types are present against each record from the
`Basic Year', `Application Year', or `Priority Year' fields. `Priority
Year' should always be the earliest of these dates, as this represents
the original conception of the idea, rather than the date at which the
application was filed with the relevant patent office. Equally, all
dates are checked to ensure that none are earlier than 1790 (when the
earliest known US patent was recorded, representing the world's earliest
patent registration system), as any dates recorded before this year are
very likely to be errors. Once any missing dates have been imputed where
possible, the script then determines the time period bounded by the set
of records in the current batch, and updates the global time frame for
the current technology as required. The bibliometric indicator counts
specified in
Table~{\ref{table:bibliometric_indicators}}~can then
be compiled for each year considered in the current batch of records,
with the current batch being marked as completed before repeating the
steps above for the next batch of records. In this way a collection of
summary indicator count tables are built representing each batch of
records. These tables are then combined into one overall summary table
for the technology being considered, taking care to expand each batch of
results for years with `zero' records as required so that the same set
of years is present when adding corresponding table rows together. To
verify that the MATLAB data extraction and cleaning processes were
functioning as planned, the output counts of the MATLAB scripts were
compared for several sample batches~to an equivalent process implemented
using Excel pivot tables. This comparison showed that in some instances
where formatting issues were present the MATLAB scripts were more
successful than Excel in filtering out blank values, but that in both
cases the overall count values generated corresponded closely to those
expected.

\subsection{Technology Life Cycle stage matching
process}

{\label{242296}}

With bibliometric profiles extracted for each of the technologies
considered in this study, the first stage of analysis consists of
identifying the transition points between different stages of the
Technology Life Cycle in order to establish time series segments for use
in subsequent comparative analysis. For the technologies considered in
this study, evidence was identified from literature to suggest when
these transitions had occurred, such as in the innovation timeline
assessments prepared for a range of technologies by Hanna (see
Fig.~{\ref{854263}}~to
Fig.~{\ref{339421}}):\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Phases-of-the-innovation-timeline-(source---UKERC)1/Phases-of-the-innovation-timeline-(source---UKERC)1}
\caption{{Phases of the innovation timeline~\protect\hyperref[csl:65]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)}
{\label{854263}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Historical-timeline-and-duration-of-innovation-for-all-technologies-reviewed-(source---UKERC)1/Historical-timeline-and-duration-of-innovation-for-all-technologies-reviewed-(source---UKERC)1}
\caption{{Historical timeline and duration of innovation for technologies reviewed
by UKERC~\protect\hyperref[csl:65]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)}
{\label{176117}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Duration-of-development-and-commercialisation-for-all-technologies-reviewed-(source---UKERC)1/Duration-of-development-and-commercialisation-for-all-technologies-reviewed-(source---UKERC)1}
\caption{{Duration of development and commercialisation for technologies reviewed
by UKERC~\protect\hyperref[csl:65]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)}
{\label{339421}}%
}}
\end{center}
\end{figure}

Full details of the transition points used in this study are provided in
Table~{\ref{table:TLC_transition_points}}.\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
% \begin{tabular*}{\textwidth}{p{4.5cm}|p{2cm}|p{2cm}|p{2cm}|p{8cm}}
\begin{tabular}{p{4.5cm}|p{2.5cm}|p{2.5cm}|p{2.5cm}|p{8cm}}
    {Case study} & {Last year of Emergence stage} & {Last year of Growth stage} & {Last year of Maturity stage} & {Technology Life Cycle transition point sources} \\ \midrule
    Compact Fluorescent Lamps & 1979 & 2011 & -- & \hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015}; (missing citation) \\
    \hline
    Electric vehicles & 1997 & 2005 & -- & (missing citation); (missing citation) \\
    \hline
    Fiber optics (data transfer) & 1970 & 1990 & -- & (missing citation); (missing citation) \\
    \hline
    Geothermal electricity & 1958 & -- & -- & (missing citation) \\
    \hline
    Halogen lights & 1959 & -- & -- & (missing citation); (missing citation); (missing citation) \\
    \hline
    Hydro electricity & 1956 & 1975 & -- & (missing citation) \\
    \hline
    Impact/Dot-matrix printers & 1970 & 1984 & 1991 & (missing citation); (missing citation); (missing citation); (missing citation); (missing citation) \\
    \hline
    Incandescent lights & 1882 & 1916 & 2008 & \hyperref[csl:11]{(Chang \& Baek, 2010}; (missing citation); (missing citation) \\
    \hline
    Ink jet printer & 1988 & 1996 & 2003 & (missing citation) \\
    \hline
    Internet & 1982 & 2000 & -- & (missing citation); (missing citation); (missing citation) \\
    \hline
    Landline telephones & 1878 & 1945 & 2009 & (missing citation); (missing citation) \\
    \hline
    Laser printer & 1979 & 1993 & -- & (missing citation); (missing citation) \\
    \hline
    LED lights & 2001 & -- & -- & \hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)} \\
    \hline
    Linear Fluorescent Tube lights & 1937 & 1990 & 2012 & (missing citation); (missing citation); (missing citation) \\
    \hline
    Nuclear electricity & 1963 & 1981 & -- & \hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)} \\
    \hline
    Solar PV & 1990 & -- & -- & \hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)} \\
    \hline
    Solar thermal electricity & 1968 & -- & -- & (missing citation); (missing citation) \\
    \hline
    TFT-LCD & 1990 & 2007 & -- & \hyperref[csl:36]{(Gao et al., 2013)} \\
    \hline
    Thermal printers & 1972 & 1985 & 2002 & (missing citation); (missing citation); (missing citation); (missing citation); (missing citation) \\
    \hline
    Tide-wave-ocean electricity & 1966 & -- & -- & (missing citation); (missing citation) \\
    \hline
    Turbojet & 1939 & 1958 & -- & (missing citation) \\
    \hline
    Wind electricity & 1982 & -- & -- & \hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)} \\
    \hline
    Wireless data transfer & 1982 & 2002 & -- & \hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015)} \\
\end{tabular}%}
\caption{{Technology Life Cycle transition points based on literature evidence}}
\label{table:TLC_transition_points}
\end{table*}

Of the 23 technologies listed in
Table~{\ref{table:TLC_transition_points}}, 20 were
found to have patent data available from during the emergence stage
(i.e. excluding incandescent lights, landline telephones, and wireless
data transfer). As such only those technologies with patent data
available during the emergence stage are considered in the analysis that
follows.

For subsequent expansion of this analysis to additional technologies
where evidence is not immediately apparent for the definition of these
segments, a nearest neighbour pattern matching process was also
developed as discussed in section~{\ref{931418}} based
on the work of Gao~\hyperref[csl:36]{(Gao et al., 2013)}. An overview of this process is
shown in Fig.~{\ref{258111}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Technology-Life-Cycle-stage-matching-process/Technology-Life-Cycle-stage-matching-process}
\caption{{Overview of Technology Life Cycle stage matching process based on the
work of Gao~\protect\hyperref[csl:36]{(Gao et al., 2013)}
{\label{258111}}%
}}
\end{center}
\end{figure}

The basic principle behind this process relates to the use of training
technologies to classify the transitions between Technology Life Cycle
stages for each test technology considered. Based on the evidence of
Hanna and Gao~\hyperref[csl:66]{(Hanna, Gross, Speirs, Heptonstall, \& Gambhir, 2015}; \hyperref[csl:36]{Gao et al., 2013)}, a total of~\textbf{\emph{XX}} training
technology profiles were available for this classification exercise, as
opposed to the two (Thin Film Transistor Liquid Crystal Displays and
Cathode Ray Tubes) used in the original work by Gao. Although different
patent databases have been used in this study, the basic patterns
observed for the two training technologies employed in Gao's work are
mostly captured in the records extracted using the Questel-Orbit tool.
This can be seen in Fig.~{\ref{391527}} where the
extracted trends are compared against the original study:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Technology-Life-Cycle-stage-matching-process---comparison-of-extracted-training-datasets/Technology-Life-Cycle-stage-matching-process---comparison-of-extracted-training-datasets}
\caption{{Comparison of extracted TFT-LCD and CRT training datasets based on the
work of Gao \protect\hyperref[csl:36]{(Gao et al., 2013)}
{\label{391527}}%
}}
\end{center}
\end{figure}

However it's worth noting that Fig.~{\ref{391527}}
shows some notable differences as well between the technology records
extracted from the Derwent Innovation Index and the Questel-Orbit
results. There are several reasons for this. Most significantly, these
two databases will not consist of exactly the same records, or the same
volume of records. This was apparent based on the record counts provided
in~\hyperref[csl:36]{(Gao et al., 2013)} which resulted in several thousand less records
for the exact same search query structure and filtration steps.
Secondly, several years have passed since the original study, and record
counts for later years will have been amended in the passing time as
more records have been accounted for. Furthermore, there may also be
discrepancies between how the two databases account for patent families
within their internal methodologies, and the exact functioning of the
search algorithms used to identify records. As such, whilst many of the
peaks in the Questel-Orbit data correspond to equivalent peaks in the
Derwent Innovation Index data, not all peaks align perfectly. There is
also an observable difference in the trend extracted for the number of
cited patents per year, which seems considerably higher based on the
Questel-Orbit data. To verify this, a separate examination of patent
citation counts using Excel pivot tables for several sample batches of
patent records was conducted, which found that the expected number of
citations extracted using the MATLAB script matched to a good level
those found when using the Excel-based procedure. Consequently, the
discrepancy here between Questel-Orbit results and Derwent Innovation
Index records could be as a result of recording differences when
addressing citations in these two databases. However, it is not felt
that this discrepancy~will significantly impact results as the analysis
that follows is based on amplitude normalised trends rather than
absolute values, and as such there will be notably less variation in the
actual values used (see Fig. {\ref{456301}}~and Fig.
{\ref{879179}} for an illustration of this).

With these discrepancies noted, the training technologies correspond to
the timescales shown for Cathode Ray Tubes (CRT), nuclear power, solar
photovoltaics, wind electricity, mobile phones, thin film transistor
liquid crystal displays (TFT-LCD), and Compact Fluorescent Light bulbs
(CFLs) provided in Fig.~{\ref{176117}}. Having
specified the training technology profiles to use, the MATLAB script
(\textbf{\emph{provided in Appendix XX}}) then smooths both training and
test time series based on a three-year moving average, and normalises
the amplitude as per the original study, as shown in Fig.
{\ref{456301}}~and Fig.
{\ref{879179}}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Development-trends-of-patent-indicators-for-fiber-optics/Development-trends-of-patent-indicators-for-fiber-optics}
\caption{{Original bibliometric trends for fiber optics as extracted from
Questel-Orbit data
{\label{456301}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Development-trends-of-patent-indicators-for-fiber-optics-(TEST-TECH---smoothed-and-normalised)/Development-trends-of-patent-indicators-for-fiber-optics-(TEST-TECH---smoothed-and-normalised)}
\caption{{Smoothed and normalised bibliometric trends for fiber optics
{\label{879179}}%
}}
\end{center}
\end{figure}

Once the training and test datasets have been smoothed and normalised,
the MATLAB script cycles through each year of the test technology data
records being considered and calculates the Euclidean distance between
the current test data point and all possible training technology data
points. This process is illustrated by Gao as shown in Fig.
{\ref{405215}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Illustration-of-distance-comparison-between-test-and-training-data-(source---Technology-life-cycle-analysis-method-based-on-patent-documents)/Illustration-of-distance-comparison-between-test-and-training-data-(source---Technology-life-cycle-analysis-method-based-on-patent-documents)}
\caption{{An example for computing the distance between test point and training
points~\protect\hyperref[csl:36]{(Gao et al., 2013)}
{\label{405215}}%
}}
\end{center}
\end{figure}

In this way the closest training point is identified from one of the
possible training technologies and years considered, with the test data
point adopting the Technology Life Cycle stage label of the matching
training point. The results of this nearest neighbour classification
system are demonstrated in Fig.~{\ref{513876}} to
Fig.~{\ref{428185}} below, where it can be seen that
the technology take-off and other Technology Life Cycle stage
transitions correspond well to the observed growth, plateau and decline
phases in the extracted technology profiles.

\graphicspath{{figures/}}

%\begin{figure}[ht!]\selectlanguage{english}
\begin{figure}[htbp!]
    \begin{center}
%
        \subfloat[Fiber optics development trends \label{fig:first}]{%
            \includegraphics[width=0.5\textwidth]{Development trends of patent indicators for fiber optics (TEST TECH - smoothed and normalised) 2/Development trends of patent indicators for fiber optics (TEST TECH - smoothed and normalised) 2.png}
        }%
        \subfloat[Turbojet development trends \label{fig:second}]{%
           \includegraphics[width=0.5\textwidth]{Development trends of patent indicators for turbojet (TEST TECH - smoothed and normalised)/Development trends of patent indicators for turbojet (TEST TECH - smoothed and normalised).png}
        }\\ %  ------- End of the first row ----------------------%
        \subfloat[Matched TLC stages for fiber optics \label{fig:third}]{%
            \includegraphics[width=0.5\textwidth]{Matched TLC stages for fiber optics (TEST TECH) - training dataset combination 2/Matched TLC stages for fiber optics (TEST TECH) - training dataset combination 2.png}
        }%
        \subfloat[Matched TLC stages for turbojet \label{fig:fourth}]{%
            \includegraphics[width=0.5\textwidth]{Matched TLC stages for turbojet (TEST TECH) - training dataset combination 2/Matched TLC stages for turbojet (TEST TECH) - training dataset combination 2.png}
        }%
%
    \end{center}
    \caption{{%
        Examples of matched Technology Life Cycle stages for fiber optics and turbojet development \label{fig:matched_TLC_stages}
    }}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Development-trends-of-patent-indicators-for-fiber-optics-(TEST-TECH---smoothed-and-normalised)-2/Development-trends-of-patent-indicators-for-fiber-optics-(TEST-TECH---smoothed-and-normalised)-2}
\caption{{Fiber optics development trends
{\label{513876}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Matched-TLC-stages-for-fiber-optics-(TEST-TECH)---training-dataset-combination-2/Matched-TLC-stages-for-fiber-optics-(TEST-TECH)---training-dataset-combination-2}
\caption{{Matched TLC stages for fiber optics
{\label{744175}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Development-trends-of-patent-indicators-for-turbojet-(TEST-TECH---smoothed-and-normalised)/Development-trends-of-patent-indicators-for-turbojet-(TEST-TECH---smoothed-and-normalised)}
\caption{{Turbojet development trends
{\label{448924}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Matched-TLC-stages-for-turbojet-(TEST-TECH)---training-dataset-combination-2/Matched-TLC-stages-for-turbojet-(TEST-TECH)---training-dataset-combination-2}
\caption{{Matched TLC stages for turbojet
{\label{428185}}%
}}
\end{center}
\end{figure}

\subsection{Identification of significant patent indicator
groups}

{\label{249496}}

Having defined the time periods corresponding to each Technology Life
Cycle stage for the technologies considered, it is now possible to
segment the bibliometric time series into comparable phases of
development. Significant predictors of substitution modes in each
Technology Life Cycle stage are then identified using the procedure
outlined in Fig. {\ref{978289}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Identification-of-significant-patent-indicator-groups-process/Identification-of-significant-patent-indicator-groups-process}
\caption{{Overview of the process used to identify and rank significant patent
indicator groups
{\label{978289}}%
}}
\end{center}
\end{figure}

As discussed in sections~{\ref{218955}}
and~{\ref{814495}}~an unsupervised learning approach
has been employed here based on applying Dynamic Time Warping and the
`PAM' variant of K-Medoids clustering on the relative distance measures
calculated between time series. This is again implemented as a MATLAB
script based on the DTW and K-Medoid functions made available by
MathsWorks \hyperref[csl:32]{(MathWorks, 2016}; \hyperref[csl:33]{\textit{{Dynamic Time Warping Clustering}}, 2015)}, \textbf{\emph{which is provided in
Appendix XX}}. The first step of this process involves generating a list
of all the unique subsets that can be created from the ten patent
indicator metrics considered in this study. Consequently, this produces
1,023 (i.e.~\(2^{10}-1\)) possible combinations of the ten patent
indicators to be tested, as illustrated by
Fig.~{\ref{488951}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Build-list-of-all-possible-patent-indicator-groupings2/Build-list-of-all-possible-patent-indicator-groupings2}
\caption{{Generating list of all possible patent indicator groupings from time
series dimensions considered
{\label{488951}}%
}}
\end{center}
\end{figure}

Next, the raw patent data time series are transformed by using an
inverse hyperbolic sine function and normalised to convert the data into
a suitable format for long-term comparisons (see discussion in
section~{\ref{697753}}). Once in this format, the data
points are filtered based on the current Technology Life Cycle stage
being considered, as illustrated by
Fig.~{\ref{418410}}, ensuring comparable curve features
are considered:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Transform-the-data-into-suitable-format-for-long-term-comparisons/Transform-the-data-into-suitable-format-for-long-term-comparisons}
\caption{{Transforming extracted patent data time series into a suitable format
for long-term comparisons
{\label{418410}}%
}}
\end{center}
\end{figure}

After the datasets have been transformed and filtered based on the
current Technology Life Cycle stage, Dynamic Time Warping is then used
to calculate the Euclidean distance between each pair of technology time
series when compared using the time series dimensions specified by each
patent indicator grouping in turn. This process is depicted visually in
Fig.~{\ref{497938}}, illustrating the successive layers
of filtering that are applied for each technology pairing and each
patent indicator grouping considered. The output from this process is an
\emph{i} x \emph{j} x 1023 distance matrix, where \emph{i} and
\emph{j}~specify the current technology pairing being considered, and
the value quoted is the measured distance between multi-dimensional time
series based on the current patent indicator subset being used. In
parallel to this the corresponding warping paths required to measure the
distance between the~\emph{N}-dimensional curves in each condition are
stored in two separate matrices for later use.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Calculate-distance-between-each-pair-of-technology-time-series-for-each-indicator-grouping/Calculate-distance-between-each-pair-of-technology-time-series-for-each-indicator-grouping}
\caption{{Calculating the~distance between each pair of technology time series for
each indicator grouping
{\label{497938}}%
}}
\end{center}
\end{figure}

Using this distance matrix it is now possible to apply K-Medoids
clustering to determine the technology groupings predicted when each
specific patent indicator subset is used. By comparing the predicted
technology groupings to those expected from the earlier literature
classifications (see section~{\ref{771448}}), a
confusion matrix is created for each patent indicator subset that shows
the alignment between predicted and target groupings as shown in
Fig.~{\ref{450923}}. Fisher's exact test is then
applied to each confusion matrix to calculate the probability of
obtaining the observed clusters. In doing so, significant patent
indicator subsets are identified based on those that have less than a
5\% chance of natural occurrence.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Identifying-patent-indicator-groups-of-interest/Identifying-patent-indicator-groups-of-interest}
\caption{{Identifying patent indicator groups of interest
{\label{450923}}%
}}
\end{center}
\end{figure}

\subsection{Ranking of grouped patent indicator
dimensions}

{\label{307966}}

As discussed in
sections~{\ref{697753}},~{\ref{387734}},
and~{\ref{814495}}~leave-p-out cross-validation
techniques provide a means to rank those bibliometric indicator subsets
that have been identified as producing a significant match to the
expected technology groupings. The first stage of this process consists
of generating lists of all possible training technology combinations and
corresponding test technology combinations based on leaving
one\textbf{\emph{}} technology out at a time. The procedure then
progresses in a similar format to the initial calculation of distances
between each pair of technology time series as shown in
Fig.~{\ref{497938}}, except that this time distance
measures are only calculated between pairs of training technologies, and
that this process is repeated for every possible combination of training
technologies that are available. As such, the output from this process
is now an~\emph{i} x~\emph{j} x 1023 x~\emph{n} distance matrix,
where~\emph{i} and~\emph{j}~now specify the current~\textbf{training}
technology pairing being considered, and \emph{n} represents the number
of training combinations that can be used. This is illustrated in
Fig.~{\ref{256478}}~and Fig.
{\ref{361537}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Build-lists-of-possible-training-technology-subsets-and-corresponding-test-technology-subsets/Build-lists-of-possible-training-technology-subsets-and-corresponding-test-technology-subsets}
\caption{{Building lists of possible training technology subsets and corresponding
test technology subsets
{\label{256478}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Calculate-distance-between-each-pair-of-training-technologies-for-each-indicator-grouping/Calculate-distance-between-each-pair-of-training-technologies-for-each-indicator-grouping}
\caption{{Calculating the distance between each pair of training technologies for
each indicator grouping
{\label{361537}}%
}}
\end{center}
\end{figure}

K-Medoids clustering is once again applied to the resulting training
technology distance matrices, from which two medoid technologies are
identified for each patent indicator subset, in each training condition.
At this point the test technologies can now be evaluated individually
against the two medoid curves identified in each training condition, in
order to determine the closest medoid to the current test technology.
This provides a classification for the test technologies based on each
training condition and each patent indicator subset. From this the
number of test technologies misclassified based on the current training
condition can be determined. This in turn is then used to calculate the
average number of test technologies misclassified for each patent
indicator grouping across all of the training conditions considered.
Finally, the results are sorted in terms of the minimum average number
of misclassifications in order to rank the robustness of each patent
indicator grouping. This procedure is illustrated in Fig.
{\ref{428246}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Ranking-of-grouped-patent-indicator-dimensions/Ranking-of-grouped-patent-indicator-dimensions}
\caption{{Ranking of grouped patent indicator dimensions
{\label{428246}}%
}}
\end{center}
\end{figure}

\subsection{Functional model building
process}

{\label{311620}}

The ranking of different bibliometric indicator subsets provides a means
to identify the time series dimensions that, when combined, are most
likely to provide robust out-of-sample predictions of the observed
technological modes of substitution. As a result, a technology
classification model is now developed using functional data analysis
(see sections~{\ref{875755}}
and~{\ref{260337}}) that is based on indicators 4 and 6
(i.e. the number of non-corporates and the number of cited references by
priority year). Besides being present in all of the highest scoring sets
of top ranked predictors, these particular dimensions can potentially be
associated with the rate of development in technology and science
respectively. This is in the sense that cited references shows a clear
link to scientific production directly influencing technological
development efforts, whilst the number of non-corporates by priority
year (which counts the number of universities, academies, non-profit
labs and technology research centres) is associated with the amount of
lab work required to commercialise a technology. Considering the measure
of non-corporates by priority year specifically, a large volume of lab
work could indicate a lack of technological maturity, or the presence of
considerable complexity in the technology being developed. By contrast,
those technologies with reduced non-corporates by priority year activity
may represent simpler technologies that mature more rapidly or
intuitively. Non-corporates by priority year could therefore equate to a
measure of technological complexity, or effort required to mature.

However, it's also worth noting that there are other indicator subset
couples/triples that perform~nearly as well. It is possible that these
other high-performing subsets may be in some way related to the chosen
indicators (i.e. perfect orthogonality can not necessarily be assumed
between these metrics), and so at this point the choice has been taken
to use the indicators specified as these have been seen to be the most
statistically robust, whilst also being in good agreement with previous
literature conclusions.

Following on from the initial introduction to functional data analysis
provided in section~{\ref{875755}}, and more detailed
methods presented in~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}, the method outlined in
Fig.~{\ref{529107}} has been implemented in MATLAB for
building a functional linear regression model for the purposes of
technology classification (\textbf{\emph{the MATLAB script is available
in Appendix XX for further details}}).\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Functional-model-building-process/Functional-model-building-process}
\caption{{Functional model building process based on methods outlined
in~\protect\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}
{\label{529107}}%
}}
\end{center}
\end{figure}

Taking the chosen time series dimensions as a starting point, a
functional data object must first be created for each of the patent
indicators (or model components) included in the chosen subset. However,
as the Technology Life Cycle stages being considered will have a
different number of observations for each case study technology, it is
first necessary to resample the segmented time series based on a common
number of resampling points. This ensures that even if one Technology
Life Cycle stage spans 20 years in one time series, and spans 50 years
in another, both time series will have 50 observations, which enables
the two curves to be aligned relative to each other for the current
Technology Life Cycle stage. Next a B-spline basis system is created for
each model component based on the common number of resampling points
defined, and at the same time for the beta coefficients
(\(\beta_i\)) to be estimated by the functional linear regression
analysis (see Eq.~{\ref{eq:basis_function_1}} and
Eq.~{\ref{eq:sedov}} in
section~{\ref{875755}}, as well as sections 3.4.1,
3.4.2, 9.4.1 and 9.4.2 of
\href{https://www.authorea.com/users/161287/articles/182044-identifying-the-mode-and-impact-of-disruptive-innovations-journal-paper\#Ramsay_2009}{(Ramsay
2009)}), as illustrated in Fig.~{\ref{416597}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.98\columnwidth]{figures/Building-functional-models-of-selected-patent-indicator-groupings/Building-functional-models-of-selected-patent-indicator-groupings}
\caption{{Building functional models of selected patent indicator groupings
{\label{416597}}%
}}
\end{center}
\end{figure}

\subsubsection{Identification of smoothing parameter values for
functional data
objects}

{\label{219671}}

Before functional data objects can be generated from the B-spline basis
systems the degree of curve smoothing to be applied has to be
determined. Following the process outlined in~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)} a
`functional parameter object' that allows smoothness to be imposed on
estimated functional parameters is now created (see section 5.2.4 of
\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}). A functional data object is then created for the
current model component using the new functional parameter object, along
with an initial value of the smoothing parameter (\(\lambda\)).
The degrees of freedom and generalised cross-validation criterion
coefficient (see section 5.3 of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}) can then be
calculated for the current functional data object. By repeating this
process for a range of~\(\lambda\) values and plotting the
results (see Fig.~{\ref{781086}}~to
Fig.~{\ref{224241}}) a suitable smoothing parameter can
be identified that will be used in the final functional data object for
each model component. An example of a smoothed functional data object
generated for the number of corporations associated with different
technologies in a given priority year is illustrated in Fig.
{\ref{405071}} and Fig.
{\ref{135605}}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Degrees-of-freedom-for-functional-parameter-object-smoothing-parameters-to-fit-cited-patents-by-priority-year---emergence/Degrees-of-freedom-for-functional-parameter-object-smoothing-parameters-to-fit-non-corporates-by-priority-year---emergence}
\caption{{Degrees of freedom for functional parameter object smoothing parameters
to fit non-corporates by priority year - emergence
{\label{781086}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Degrees-of-freedom-for-functional-parameter-object-smoothing-parameters-to-fit-cited-references-by-priority-year---emergence/Degrees-of-freedom-for-functional-parameter-object-smoothing-parameters-to-fit-cited-references-by-priority-year---emergence}
\caption{{Degrees of freedom for functional parameter object smoothing parameters
to fit cited references by priority year - emergence
{\label{631566}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Generalised-cross-validation-scores-for-cited-patents-by-priority-year-functional-parameter-object-smoothing-parameter---emergence/Generalised-cross-validation-scores-for-non-corporates-by-priority-year-functional-parameter-object-smoothing-parameter---emergence}
\caption{{Generalised cross-validation scores for non-corporates by priority year
functional parameter object smoothing parameter - emergence
{\label{681098}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Generalised-cross-validation-scores-for-cited-references-by-priority-year-functional-parameter-object-smoothing-parameter---emergence/Generalised-cross-validation-scores-for-cited-references-by-priority-year-functional-parameter-object-smoothing-parameter---emergence}
\caption{{Generalised cross-validation scores for cited references by priority
year functional parameter object smoothing parameter - emergence
{\label{224241}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Technology-profiles-for-corporates-by-priority-year-(normalised-time)/Technology-profiles-for-corporates-by-priority-year-(normalised-time)}
\caption{{Technology profiles for corporates by priority year (normalised time)
{\label{405071}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Functional-Data-Object-for-all-technology-profiles-based-on-corporates-by-priority-year-(edited)/Functional-Data-Object-for-all-technology-profiles-based-on-corporates-by-priority-year-(edited)}
\caption{{Functional Data Object for all technology profiles based on corporates
by priority year
{\label{135605}}%
}}
\end{center}
\end{figure}

\subsubsection{Assessing the fit of generated functional data
objects}

{\label{841850}}

Having created a functional data object representation of each model
component from the selected bibliometric subset, the MATLAB script then
assesses the fit of each functional data object to the trend data. This
is accomplished by calculating the residuals, variance, and standard
deviations between the real and modelled values across the different
technology curves included, but also across the time span of the
Technology Life Cycle stage considered (see section 5.5
of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}). The results of this assessment of fit are
presented in Fig.~{\ref{650648}} to
Fig.~{\ref{257292}} for the number of non-corporates
and the number of cited references by priority year:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Standard-deviations-of-the-residuals-within-technologies-from-the-functional-data-object-for-corporates-by-priority-year-(edited)/Standard-deviations-of-the-residuals-within-technologies-from-the-functional-data-object-for-non-corporates-by-priority-year---emergence}
\caption{{Standard deviations of the residuals within technologies from the
functional data object for non-corporates by priority year
{\label{650648}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Standard-deviations-of-residuals-within-technologies-from-functional-data-object-for-top-5-IPC-subclass-patents-by-priority-year-(edited)/Standard-deviations-of-the-residuals-within-technologies-from-the-functional-data-object-for-cited-references-by-priority-year---emergence}
\caption{{Standard deviations of residuals within technologies from functional
data object for cited references by priority year
{\label{281118}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Standard-deviations-of-the-residuals-within-time-from-the-functional-data-object-for-corporates-by-priority-year-(edited)/Standard-deviations-of-the-residuals-within-time-from-the-functional-data-object-for-non-corporates-by-priority-year---emergence}
\caption{{Standard deviations of the residuals within time from the functional
data object for non-corporates by priority year
{\label{952058}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Standard-deviations-of-the-residuals-within-time-from-the-functional-data-object-for-top-5-IPC-subclass-patents-by-priority-year-(edited)/Standard-deviations-of-the-residuals-within-time-from-the-functional-data-object-for-cited-references-by-priority-year---emergence}
\caption{{Standard deviations of the residuals within time from the functional
data object for cited references by priority year
{\label{257292}}%
}}
\end{center}
\end{figure}

\subsubsection{Functional descriptive statistics for generated
functional data
objects}

{\label{152701}}

A related sanity check for the functional data objects generated for
each model component (before they are used in the functional linear
regression analysis) is the plotting of functional descriptive
statistics (see section 6.1.1 of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}). The functional
mean and standard deviation of the functional data objects for the
number of non-corporates and the number of cited references by priority
year are shown in Fig.~{\ref{413726}} and
Fig.~{\ref{437199}} respectively, and show that for
both model components variability increases as time progresses (as would
be expected with most forecasts). In addition the mean functional data
object values show that there is a notable early surge in non-corporates
by priority year during the emergence phase before a technology achieves
mainstream adoption. This corresponds well to the hype cycle associated
with new technologies during early development when significant levels
of R\&D are first launched in a race to achieve commercialisation, which
can often prove~premature or short-lived. By contrast, the mean cited
references by priority year measure shows that a steadily accelerating
growth is observed during the emergence phase, without significant
undulation, potentially implying that scientific development efforts are
less phased by disturbances as they begin to accumulate.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Mean-functional-data-object-values-for-chosen-patent-indicator-subset-(edited)/Mean-functional-data-object-values-for-chosen-patent-indicator-subset---emergence}
\caption{{Mean functional data object values for chosen patent indicator subset
{\label{413726}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Standard-deviation-of-functional-data-objects-created-for-chosen-patent-indicator-subset-(edited)/Standard-deviation-of-functional-data-objects-created-for-chosen-patent-indicator-subset---emergence}
\caption{{Standard deviation of functional data objects created for chosen patent
indicator subset
{\label{437199}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-bivariate-variance-covariance-surface-and-contours-for-corporates-by-priority-year-(unaligned)/Estimated-bivariate-variance-covariance-surface-and-contours-for-non-corporates-by-priority-year---emergence}
\caption{{Estimated bivariate variance-covariance surface and contours for
non-corporates by priority year
{\label{283475}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-bivariate-cross-covariance-surface-and-contours-for-patent-indicators-3-and-9-(unaligned)/Estimated-bivariate-cross-covariance-surface-and-contours-for-patent-indicators-4-and-6---emergence}
\caption{{Estimated bivariate cross-covariance surface and contours for patent
indicators 4 and 6
{\label{900952}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-bivariate-cross-correlation-surface-and-contours-for-patent-indicators-3-and-9-(unaligned)/Estimated-bivariate-cross-correlation-surface-and-contours-for-patent-indicators-4-and-6---emergence}
\caption{{Estimated bivariate cross-correlation surface and contours for patent
indicators 4 and 6
{\label{954614}}%
}}
\end{center}
\end{figure}

\textbf{\emph{Interpretation of functional descriptive statistics
{[}TBD{]}}}

\subsubsection{Canonical correlation
analysis}

{\label{938623}}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/The-first-pair-of-canonical-weight-functions-for-patent-indicators-3-and-9-(unaligned)/The-first-pair-of-canonical-weight-functions-for-patent-indicators-4-and-6---emergence}
\caption{{The first pair of canonical weight functions for patent indicators 4 and
6
{\label{977022}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/The-scores-for-the-first-pair-of-canonical-variables-plotted-against-each-other-for-patent-indicators-3-and-9-(unaligned)/The-scores-for-the-first-pair-of-canonical-variables-plotted-against-each-other-for-patent-indicators-4-and-6---emergence}
\caption{{The scores for the first pair of canonical variables plotted against
each other for patent indicators 4 and 6
{\label{589224}}%
}}
\end{center}
\end{figure}

\subsubsection{Identification of smoothing parameter values for
regression
coefficients}

{\label{945131}}

With the functional data objects for each model component now ready, a
cell array containing each model component along with a constant
predictor term is generated for use in the functional liner regression.
Before the final regression analysis can be run, a smoothing parameter
for the regression coefficient beta basis system has to be selected.
This is achieved by calculating leave-one-out cross-validation scores
(i.e. error sum of squares values) for functional responses using a
range of different smoothing parameter values, as per section 9.4.3 and
10.6.2 of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}. The results of this cross-validation
exercise are shown in Fig. {\ref{650456}}~and Fig.
{\ref{342847}}~for the number of non-corporates~by
priority year beta basis system smoothing parameter during the emergence
phase of the Technology Life Cycle:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Cross-validation-scores-for-the-cited-patents-by-priority-year-beta-basis-system-smoothing-parameter---emergence/Cross-validation-scores-for-the-non-corporates-by-priority-year-beta-basis-system-smoothing-parameter---emergence}
\caption{{Cross-validation scores for the non corporates by priority year beta
basis system smoothing parameter - emergence
{\label{650456}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Cross-validation-scores-for-the-cited-patents-by-priority-year-beta-basis-system-smoothing-parameter---emergence-(refined)/Cross-validation-scores-for-the-non-corporates-by-priority-year-beta-basis-system-smoothing-parameter---emergence-(refined)}
\caption{{Cross-validation scores for the non corporates by priority year beta
basis system smoothing parameter - emergence (refined)
{\label{342847}}%
}}
\end{center}
\end{figure}

The functional parameter object used in the beta basis system is then
redefined based on the refined smoothing parameter identified in order
to ensure that the functional linear regression analysis converges on a
model that has the best chances of performing well out-of-sample.

\subsubsection{Functional linear regression
analysis}

{\label{511602}}

The functional linear regression analysis can now be run with the
identified smoothing parameters and scalar response variables to
identify the~\(\beta_i\) coefficients and the corresponding
variance used to define the 95\% confidence bounds (see sections 9.4.3
and 9.4.4 of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)} respectively).
Fig.~{\ref{820059}} to
Fig.~{\ref{942889}} show the
resulting~\(\beta_i\) coefficients and confidence bounds for the
number of non-corporates and the~number of cited references by priority
year, when considering the emergence phase of development and using a
high-dimensional regression fit (i.e. when the beta basis system for
each regression coefficient is made of a large number of B-splines).
This regression fit successfully identifies the correct mode of
substitution from patent data available in the emergence stage for 19 of
the 20 technologies considered.

From the confidence bounds on these plots it can be seen that for both
the number of non-corporates and the number of cited references by
priority year the variance is highest at the start of the emergence
phase: this is often when the least amount of data is available for
comparing each technology, so this is not entirely surprising as this
represents the point of greatest uncertainty. However,
Fig.~{\ref{822351}}~and
Fig.~{\ref{942889}} also illustrate how the influence
these two patent dimensions have on the predicted mode of substitution
varies with time during the emergence phase. More specifically,
deviations away from zero in these coefficient functions equate to an
increased positive or negative weighting for the associated patent
indicator count at that moment in time, within the determination of the
predicted mode of substitution. As such it can be seen that any patent
indicator counts at~\emph{t = 0}~for the number of non-corporates by
priority year (assuming these are present) will have a more significant
influence on the final mode of substitution predicted. Equally, these
particular regression results would suggest that the impact of
non-corporates activity next peaks around 40\% of the way through the
emergence phase (potentially corresponding to the hype effect suggested
previously), and again at the end of the emergence phase. For the number
of cited references by priority year, this regression model suggests
that the times of greatest impact on the mode of substitution are at the
very beginning and at the very end of the emergence stage
respectively.~Whilst these coefficient plots gives some indication of
the relative weighting applied to patent indicator counts as time
progresses, the cumulative nature of the inner products used in
Eq.~{\ref{eq:sedov}} makes it difficult to visually
infer from these plots alone which mode the technology under evaluation
is currently converging towards.~For this it is also necessary to
include the corresponding patent indicator count values that these
coefficient terms are multiplied by for the specific technology being
assessed.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-regression-coefficient-for-the-constant-functional-basis-system---emergence/Estimated-regression-coefficient-for-the-constant-functional-basis-system---emergence}
\caption{{Estimated regression coefficient for the constant functional basis
system - emergence
{\label{820059}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-regression-coefficient-for-predicting-technology-cluster-from-cited-patents-by-priority-year---emergence/Estimated-regression-coefficient-for-predicting-technology-cluster-from-non-corporates-by-priority-year---emergence}
\caption{{Estimated regression coefficient for predicting technology cluster from
non-corporates by priority year - emergence
{\label{822351}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-regression-coefficient-for-predicting-technology-cluster-from-cited-references-by-priority-year---emergence/Estimated-regression-coefficient-for-predicting-technology-cluster-from-cited-references-by-priority-year---emergence}
\caption{{Estimated regression coefficient for predicting technology cluster from
cited references by priority year - emergence
{\label{942889}}%
}}
\end{center}
\end{figure}

Whilst the regression coefficient plots help to provide a possible
interpretation of the relationship between the different model
components and the predicted technology substitution classifications
(\textbf{\emph{discussed in section XX}}), it is also necessary to check
the `goodness-of-fit' measures associated with these results. As such,
R-Squared, adjusted R-Squared, and F-ratio statistics are calculated
(see section 9.4.1 and 9.4.2 of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}) to assess~the~
overall fit of the high-dimensional functional linear regression model,
as summarised in
Table~{\ref{table:results_high_dimensional_model}}:\selectlanguage{english}
\begin{table*}
\centering
\begin{tabular}{p{1.5cm}p{1.5cm}p{1.5cm}p{1.7cm}p{1.7cm}p{1.5cm}}
    {Correct mode type} & {R-squared} & {Adjusted R-squared} & {Degrees of freedom 1} & {Degrees of freedom 2} & {F-ratio} \\ \midrule
    19/20 & 0.7954 & 0.7713 & 7.7837 & 11.2163 & 5.6024 \\
\end{tabular}
\caption{{Results of high dimensional model fit}}
\label{table:results_high_dimensional_model}
\end{table*}

The R-squared and adjusted R-squared values shown in
Table~{\ref{table:results_high_dimensional_model}}
would suggest~that a reasonable fit has been achieved with this
model,~with a good level of accuracy, whilst the F-ratio of 5.60 with
degrees of freedom 7.78 and 11.22 respectively implies that the
relationship established has a p-value somewhere between 0.0041 and
0.0060. As such this result appears to be~significant at the 1\% level.

\subsubsection{Benchmarking functional regression
model}

{\label{197409}}

However, to ensure that this is the most appropriate fit to the data
presented, the high-dimensional model initially developed was
subsequently benchmarked against a low-dimensional model (i.e. when the
beta basis system for each regression coefficient is made of a small
number of B-splines), as well as a constant and a monomial based model.
The corresponding~\(\beta_i\) coefficients from the benchmarking
analysis for the low-dimensional model are presented in Fig.
{\ref{934895}} to Fig. {\ref{232642}},
whilst the `goodness-of-fit' measures for all the alternative functional
linear regression models are compiled in Table
{\ref{table:results_benchmarking}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Low-dimensional-estimate-of-the-regression-coefficient-for-the-constant-functional-basis-system---emergence/Low-dimensional-estimate-of-the-regression-coefficient-for-the-constant-functional-basis-system---emergence}
\caption{{Low-dimensional estimate of the regression coefficient for the constant
functional basis system - emergence
{\label{934895}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Low-dimensional-estimate-of-the-regression-coefficient-for-predicting-group-from-cited-patents-by-priority-year---emergence/Low-dimensional-estimate-of-the-regression-coefficient-for-predicting-group-from-non-corporates-by-priority-year---emergence}
\caption{{Low-dimensional estimate of the regression coefficient for predicting
group from non-corporates by priority year - emergence
{\label{220308}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Low-dimensional-estimate-of-the-regression-coefficient-for-predicting-group-from-cited-references-by-priority-year---emergence/Low-dimensional-estimate-of-the-regression-coefficient-for-predicting-group-from-cited-references-by-priority-year---emergence}
\caption{{Low-dimensional estimate of the regression coefficient for predicting
group from cited references by priority year - emergence
{\label{232642}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{table*}
%\resizebox{\textwidth}{!}{%
\begin{tabular}{p{2.5cm}p{1.5cm}p{1.5cm}p{1.5cm}p{1.7cm}p{1.7cm}p{1.5cm}p{1.5cm}}
    {Model basis} & {Correct mode type} & {R-squared} & {Adjusted R-squared} & {Degrees of freedom 1} & {Degrees of freedom 2} & {F-ratio} & {p-value} \\ \midrule
    Low dimension & 19/20 & 0.8514 & 0.8340 & 10 & 9 & 5.1584 & 0.0107 \\
    Constant & 18/20 & 0.6200 & 0.5753 & 2 & 17 & 13.8684 & 0.0003 \\
    Monomial & 19/20 & 0.8139 & 0.7920 & 8 & 11 & 6.0139 & 0.0040 \\
\end{tabular}%}
\caption{{Benchmarking results}}
\label{table:results_benchmarking}
\end{table*}

Whilst the R-squared and adjusted R-squared~measures observed in
Table~{\ref{table:results_benchmarking}}~would suggest
that the low-dimensional model provides a better fit, the associated
F-ratio score and corresponding p-value suggests a lower significance
than those values observed for the high-dimensional model. Conversely,
the constant basis model does not appear to provide as good a fit to the
expected scalar responses from the R-squared and adjusted R-squared
values, but this is not surprising considering the more limited nature
of models built on constant terms. Finally, the monomial basis system
performs fractionally better on both the R-squared and adjusted
R-squared measures whilst also achieving a comparable level of
significance to the high-dimensional~model. Consequently, from this
benchmarking analysis it would appear that the high-dimensional and
monomial basis system models are the most suitable candidates, but it is
possible that the overall performance of the high-dimensional model
could be further improved by sensitivity studies into the optimum number
of B-splines to use in the regression fit.

\subsubsection{Permutation testing of functional regression
models}

{\label{805104}}

To further validate the statistical significance of the four models
considered here permutation testing is applied to count the proportion
of generated F values that are larger than the F-statistic for each
model (see section 9.5 of~\hyperref[csl:21]{(Ramsay, Hooker, \& Graves, 2009)}). This involves repeatedly
shuffling the expected mode classification labels versus the technology
profiles being read (maintaining their original order) to see if it is
still possible to fit the regression model to these reordered responses.
In so doing, this test also creates a null distribution versus
the~\emph{q}\textsuperscript{th} quantile and observed F-statistic
generated from the models themselves. The results of this analysis are
shown in Fig.~{\ref{586585}} to
Fig.~{\ref{647363}}:\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Permutation-F-Test-and-null-distribution-for-the-high-dimensional-functional-regression-model---emergence/Permutation-F-Test-and-null-distribution-for-the-high-dimensional-functional-regression-model---emergence}
\caption{{Permutation F-Test and null distribution for the high-dimensional
functional regression model - emergence
{\label{586585}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Permutation-F-Test-and-null-distribution-for-the-low-dimensional-functional-regression-model---emergence/Permutation-F-Test-and-null-distribution-for-the-low-dimensional-functional-regression-model---emergence}
\caption{{Permutation F-Test and null distribution for the low-dimensional
functional regression model - emergence
{\label{925388}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Permutation-F-Test-and-null-distribution-for-the-constant-basis-system-functional-regression-model---emergence/Permutation-F-Test-and-null-distribution-for-the-constant-basis-system-functional-regression-model---emergence}
\caption{{Permutation F-Test and null distribution for the constant basis system
functional regression model - emergence
{\label{810269}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Permutation-F-Test-and-null-distribution-for-the-monomial-basis-system-functional-regression-model---emergence/Permutation-F-Test-and-null-distribution-for-the-monomial-basis-system-functional-regression-model---emergence}
\caption{{Permutation F-Test and null distribution for the monomial basis system
functional regression model - emergence
{\label{647363}}%
}}
\end{center}
\end{figure}

For statistical significance it is necessary that the observed test
statistic is found in the tail of the distribution generated. As such,
in this stage of the analysis the high and low-dimensional models
perform best as the observed F-statistics are furthest along each
distribution's right tail in relative terms in comparison to the other
distributions generated for the constant and monomial based models.
These distributions also suggest that a similar level of statistical
significance is observed between the high and low-dimensional models,
although as this permutation testing was only based on 1,000
permutations, the distributions could still evolve further with a
greater number of permutations. However, the constant basis system model
is more clearly seen here not to perform as well, with the observed
F-statistic closest to the main body of the distribution. This, in
combination with the other `goodness-of-fit' measures, would therefore
suggest that the high-dimensional functional linear regression model
provides the best basis for a technology substitution classification
model from those tested in this analysis.

\subsubsection{Functional principal components
analysis}

{\label{157221}}

Run functional Principal Components Analysis to build alternative
classification model for each TLC stage

Set~ smoothing parameter value to apply to principal components analysis
functional~ parameter objects and the number of principal components to
retain

Create~ functional parameter object for each model component using
selected fPCA~ smoothing parameter value and run fPCA

Compile~ matrix of harmonic functions for the retained components of
each patent~ indicator data object

Compile~ matrix of principal components analysis scores for the retained
components of~ each patent indicator data object\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-1---cited-references-by-priority-year---emergence/PCA-function-1---cited-references-by-priority-year---emergence}
\caption{{\textbf{\emph{PCA function 1 - cited references by priority year -
emergence}}
{\label{730634}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-2---cited-references-by-priority-year---emergence/PCA-function-2---cited-references-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 2 - cited references by priority year -
emergence}}
{\label{732583}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-3---cited-references-by-priority-year---emergence/PCA-function-3---cited-references-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 3 - cited references by priority year -
emergence}}
{\label{269208}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-4---cited-references-by-priority-year---emergence/PCA-function-4---cited-references-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 4 - cited references by priority year -
emergence}}
{\label{601958}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-1---cited-patents-by-priority-year---emergence/PCA-function-1---cited-patents-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 1 - cited patents by priority year -
emergence}}
{\label{854253}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-2---cited-patents-by-priority-year---emergence/PCA-function-2---cited-patents-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 2 - cited patents by priority year -
emergence}}
{\label{753074}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-3---cited-patents-by-priority-year---emergence/PCA-function-3---cited-patents-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 3 - cited patents by priority year -
emergence}}
{\label{455443}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/PCA-function-4---cited-patents-by-priority-year---emergence/PCA-function-4---cited-patents-by-priority-year---emergence}
\caption{{\emph{\textbf{PCA function 4 - cited patents by priority year -
emergence}}
{\label{955682}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Harmonics-for-cited-references-by-priority-year---emergence/Harmonics-for-non-corporates-by-priority-year---emergence}
\caption{{Harmonics for non-corporates by priority year - emergence
{\label{123713}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Harmonics-for-cited-patents-by-priority-year---emergence/Harmonics-for-cited-references-by-priority-year---emergence}
\caption{{Harmonics for cited references by priority year - emergence
{\label{425083}}%
}}
\end{center}
\end{figure}

Fit~ linear regression model based on the principal components scores
for each~ patent indicator data object

Extract~ functional principal component analysis coefficients for
building linear~ regression model and determine the variance of
coefficients

Combine~ the fPCA~ coefficients with the corresponding fPCA~ harmonic
function for each patent indicator data object

Combine~ the fPCA~ coefficient variances with the square of the
corresponding fPCA~ harmonic function for~each~ patent indicator data
object\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-fPCA-regression-coefficient-for-predicting-technology-cluster-from-cited-references-by-priority-year---emergence/Estimated-fPCA-regression-coefficient-for-predicting-technology-cluster-from-non-corporates-by-priority-year---emergence}
\caption{{Estimated fPCA regression coefficient for predicting technology cluster
from non-corporates by priority year - emergence
{\label{377481}}%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Estimated-fPCA-regression-coefficient-for-predicting-technology-cluster-from-cited-patents-by-priority-year---emergence/Estimated-fPCA-regression-coefficient-for-predicting-technology-cluster-from-cited-references-by-priority-year---emergence}
\caption{{Estimated fPCA regression coefficient for predicting technology cluster
from cited references by priority year - emergence
{\label{408846}}%
}}
\end{center}
\end{figure}

\section{Conclusions from statistical ranking and functional data
analysis}

{\label{617444}}

\begin{itemize}
\tightlist
\item
  Preliminary adoption data appears~ to show a distinction between those
  technologies arising as a result of~ technological failure, and those
  arising based on a presumptive technological~ leap (to be confirmed)
\item
  Comparison of functional linear~ regression vs. functional principal
  components analysis: conclusions?
\end{itemize}

Expanding on previous historical accounts of technological substitutions
this study has examined the premise that two principal modes are often
observed when considering transitions between successive commercially
prevalent technologies: reactive and presumptive technological
substitutions. These two modes are believed to correspond to
significantly different technology adoption characteristics (not
discussed in this paper), with scientific foresight believed to play a
crucial role in the identification of presumptive innovations, and
performance stagnation leading to reactive transitions. In both cases,
technological anomalies are believed to arise, either as a result of
scientific or technological crisis, that subsequently trigger the
eventual shift to the next technological paradigm. As such, this paper
has considered 23 example technologies where literature evidence of
performance development trends has been found in order to test the
ability to correctly identify associated adoption modes using
bibliometric, pattern recognition, and statistical analysis techniques.
The results obtained from this analysis suggest that statistical
analysis of patent indicator time series,~segmented based on identified
Technology Life Cycle features, provides a possible means for
classification of technological substitutions. Specifically, for the
datasets considered~measures of the number of cited references and the
involvement of non-corporate entities by year during the emergence phase
were found to provide a good indication of the expected mode of
substitution when used as a basis for functional linear regression
(correctly classifying 19 out of 20 technologies included in this
stage), and performed consistently well in statistical ranking of
predictive capability. These selected patent data dimensions can be
associated with perceptions of scientific and technological production
respectively, consistent with the basic prerequisites listed in
section~{\ref{771448}} for a classification scheme that
can identify presumptive technological substitutions. Whilst these two
patent dimensions occur in all of the most robust predictor subsets
(i.e. in terms of out-of-sample reliability) when basing analysis on the
emergence stage, this does not prove that these are the only indicators
capable of predicting modes of technological substitution. As discussed
in section~{\ref{311620}}, the possibility of
orthogonality has not been ruled out with regards to the other patent
indicators shown in
Table~{\ref{table:bibliometric_indicators}}. However,
these two dimensions are in good agreement with the technological
anomaly arguments put forward by Constant in
sections~{\ref{978136}}
and~{\ref{771448}}, and so were felt to be reasonable
for forming the basis of the technology classification model that has
been developed using functional linear regression. In particular, a
regression fit made up of beta coefficient functions with many B-spline
elements was found to provide a viable means of correctly matching the
mode of substitution to the technology profile being evaluated when
considering multiple `goodness of fit' measures. Permutation testing of
the derived technology classification model further suggests that the
regression fit is sensitive to the ordering of the expected mode labels
relative to the technology time series being considered, so this
relationship would appear to be based on the specifics of the individual
technology curves considered, and does not appear to be occurring by
chance. This implies that it may be possible to predict modes of
substitution from limited bibliometric data during the earliest stages
of technology development, providing some evaluation of the progress
through the early stages of Technology Life Cycle is made (this can be
obtained using a nearest neighbour matching process, not discussed in
this paper). Equally this shows that the functional data approach
employed corroborates well the earlier statistical rankings produced
using Dynamic Time Warping, K-Medoids clustering, and leave-one-out
cross-validation of the selected patent indicators, suggesting that
these two methods are compatible for this type of analysis.

It is also important to remember the potential limitations of this study
that would need to be addressed for further confidence in the
methodology used. Firstly, only a relatively small number of
technologies have been evaluated in this study due to the time-consuming
process required for data extraction, preparation, and identification of
supporting evidence from literature for the assignment of expected
classification labels. Consequently, whilst precautions have been taken
to minimise the risk of model over-fitting, the cross-validation
procedures employed would benefit from further verification with a more
diverse spread of technologies to ensure that out-of-sample errors are
accurately captured here. Regression models based on small sample sizes
can be very fickle to the datasets they are calibrated to, so it cannot
be ruled out that the results presented here are a better fit to the
industries included in this analysis, rather than a model that can be
necessarily generalised to all technologies. However, perhaps the most
important note of caution regarding this work relates to the
quantitative approaches used here. Whilst statistical approaches are
well-suited to detecting underlying correlations in historical and
experimental datasets, this on it's own does not provide a detailed
understanding of the causation behind associated events, particularly in
this case when considering the breadth of reasons for technological
stagnations, `failures', or presumptive leaps to occur. Equally,
statistical methods are not generally well suited to predicting
disruptive events and complex interactions, with other simulation
techniques such as System Dynamics and Agent Based Modelling performing
better in these areas. Accordingly, to identify causation effects and
test the sensitivity of technological substitution patterns to
variability arising from real-world socio-technical behaviours not
captured in simple bibliometric indicators (such as the influence of
competition, organisational, and economic effects), the fitted
regression model presented here also needs to be evaluated in a causal
environment. Similarly, in order to demonstrate practical applicability
the mode of substitutions considered here need to be related to observed
adoption characteristics (not discussed in this paper). Consequently, a
System Dynamics model built on the regression functions identified in
this study is proposed (although not discussed here) in order to
calibrate these extracted technology profiles and mode predictions to
empirical adoption data. This aims to more thoroughly explore the causal
mechanisms relating early indicators of technological substitution to
the eventual adoption patterns observed and provide a means of applying
greater reasoning to the relationships identified here.

\section{Acknowledgements}

{\label{687807}}

TBD

\selectlanguage{english}
\FloatBarrier
\section*{References}\sloppy
\phantomsection
\label{csl:22} Retrieved from \url{https://reference.wolfram.com/applications/eda/SmoothingDataFillingMissingDataAndNonparametricFitting.html}

\phantomsection
\label{csl:23} (2013). Retrieved from \url{https://www.researchgate.net/post/When_and_why_do_we_need_data_normalization}

\phantomsection
\label{csl:24} (2015). Retrieved from \url{https://stats.stackexchange.com/questions/144013/smoothing-when-to-use-it-and-when-not-to}

\phantomsection
\label{csl:28} (2014). Retrieved from \url{https://www.researchgate.net/post/Log_transformation_of_values_that_include_0_zero_for_statistical_analyses2}

\phantomsection
\label{csl:33} (2015). Retrieved from \url{https://stats.stackexchange.com/questions/131281/dynamic-time-warping-clustering}

\phantomsection
\label{csl:34} (2012). Retrieved from \url{https://stats.stackexchange.com/questions/26048/when-where-to-use-functional-data-analysis}

\phantomsection
\label{csl:46} (2016). {IEA}.

\phantomsection
\label{csl:48} (2016). In \textit{Communication Networks Economy} (pp. 253–253). John Wiley {\&} Sons Inc.

\phantomsection
\label{csl:52} (2016). The World Bank.

\phantomsection
\label{csl:53} (2000). The World Bank.

\phantomsection
\label{csl:47}4E, I. E. A. (2014). \textit{{IEA 4E Mapping and Benchmarking}}. Retrieved from \url{http://mappingandbenchmarking.iea-4e.org/}

\phantomsection
\label{csl:58}Albino, V., Ardito, L., Dangelico, R. M., \& Petruzzelli, A. M. (2014). {Understanding the development trends of low-carbon energy technologies: A patent analysis}. \textit{Applied Energy}, \textit{135}, 836–854.

\phantomsection
\label{csl:39}Analytics, C. (2017). \textit{{Web of Science - Get the Facts!}}. Retrieved from \url{http://clarivate.com/scientific-and-academic-research/research-discovery/web-of-science/#regional}

\phantomsection
\label{csl:43}Association, T. G. A. M. (2016). \textit{{Statistical Databook and Industry Outlook}}. Retrieved from \url{https://gama.aero/facts-and-statistics/statistical-databook-and-industry-outlook/}

\phantomsection
\label{csl:30}Bagnall, A., Lines, J., Bostrom, A., Large, J., \& Keogh, E. (2016). {The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances}. \textit{Data Mining and Knowledge Discovery}, \textit{31}, 606–660.

\phantomsection
\label{csl:54}Bank, W. (2017). \textit{{World Bank Open Data}}. Retrieved from \url{http://data.worldbank.org/}

\phantomsection
\label{csl:3}Bass, F. M. (2004). {Comments on {\textquotedblleft}A New Product Growth for Model Consumer Durables The Bass Model{\textquotedblright}}. \textit{Management Science}, \textit{50}, 1833–1840.

\phantomsection
\label{csl:0}Boyer, P. D. (1998). {Energy Life, and {ATP} (Nobel Lecture)}. \textit{Angewandte Chemie International Edition}, \textit{37}, 2296–2307.

\phantomsection
\label{csl:50}Candor, I. T. (2016). \textit{{Printer sales fell 14.5\%, peripherals by 12.1\% in 2015}}. Retrieved from \url{http://www.itcandor.com/printer-2015/}

\phantomsection
\label{csl:51}Candor, I. T. \textit{{Methodology}}. Retrieved from \url{http://www.itcandor.com/methodology/}

\phantomsection
\label{csl:11}Chang, Y. S., \& Baek, S. J. (2010). {Limit to improvement: Myth or reality?}. \textit{Technological Forecasting and Social Change}, \textit{77}, 712–729.

\phantomsection
\label{csl:13}Christensen, C. M., \& Rosenbloom, R. S. (1995). {Explaining the attacker{\textquotesingle}s advantage: Technological paradigms organizational dynamics, and the value network}. \textit{Research Policy}, \textit{24}, 233–257.

\phantomsection
\label{csl:45}Corporation, I. D. \textit{{IDC tracker \& Data Products}}. Retrieved from \url{http://www.idc.com/tracker/showtrackerhome.jsp}

\phantomsection
\label{csl:61}Dong, B., Xu, G., Luo, X., Cai, Y., \& Gao, W. (2012). {A bibliometric analysis of solar power research from 1991 to 2010}. \textit{Scientometrics}, \textit{93}, 1101–1117.

\phantomsection
\label{csl:42}Eurostat. (2017). \textit{{Eurostat database}}. Retrieved from \url{http://ec.europa.eu/eurostat/data/database}

\phantomsection
\label{csl:5}Foster, R. N. (1986). {Innovation: The Attacker's Advantage}. In \textit{McKinsey and Company, New York}.

\phantomsection
\label{csl:14}Foster, R. N. (1985). {Timing technological transitions}. \textit{Technology in Society}, \textit{7}, 127–141.

\phantomsection
\label{csl:36}Gao, L., Porter, A. L., Wang, J., Fang, S., Zhang, X., Ma, T., … Huang, L. (2013). {Technology life cycle analysis method based on patent documents}. \textit{Technological Forecasting and Social Change}, \textit{80}, 398–407.

\phantomsection
\label{csl:41}Global, F. (2017). \textit{{Ascend Fleets}}. Retrieved from \url{http://www.ascendworldwide.com/what-we-do/ascend-data/aircraft-airline-data/ascend-online-fleets.html}

\phantomsection
\label{csl:7}Gooday, G. (1998). {Re-writing the `book of blots': Critical reflections on histories of technological `failure'}. \textit{History and Technology}, \textit{14}, 265–291.

\phantomsection
\label{csl:65}Hanna, R., Gross, R., Speirs, J., Heptonstall, P., \& Gambhir, A. (2015). \textit{{Innovation Timelines from Invention to Maturity. A Rapid Review of the Evidence on the Time Taken for New Technologies to Reach Widespread Commercialisation}}. Draft Working Paper. UKERC Technology and Policy Assessment. UK Energy Research Centre, London, United Kingdom.

\phantomsection
\label{csl:66}Hanna, R., Gross, R., Speirs, J., Heptonstall, P., \& Gambhir, A. (2015). {Innovation timelines from invention to maturity}. \textit{UK Energy Research Centre}.

\phantomsection
\label{csl:63}Helm, S., Tannock, Q., \& Iliev, I. (2014). {Renewable energy technology: Evolution and policy implications—Evidence from patent literature}. \textit{Global Challenges Report. WIPO, Gen{\`e}Ve}.

\phantomsection
\label{csl:27}Hyndman, R. J. (2010). \textit{{Transforming data with zeros}}. Retrieved from \url{https://robjhyndman.com/hyndsight/transformations/}

\phantomsection
\label{csl:12}II, E. W. C. (1973). {A Model for Technological Change Applied to the Turbojet Revolution}. \textit{Technology and Culture}, \textit{14}, 553.

\phantomsection
\label{csl:37}Lambert, N. (2000). \textit{{Orbit and Questel-Orbit: Farewell and Hail}}. Retrieved from \url{http://www.infotoday.com/searcher/feb00/lambert.htm}

\phantomsection
\label{csl:19}Lin, J., Williamson, S., Borne, K., \& DeBarr, D. (2012). {Pattern recognition in time series}. \textit{Advances in Machine Learning and Data Mining for Astronomy}, \textit{1}, 617–645.

\phantomsection
\label{csl:18}Little, A. D. (1981). \textit{{The strategic management of technology}}. Arthur D. Little.

\phantomsection
\label{csl:20}Lucero, J. C., \& Koenig, L. L. (2000). {Time normalization of voice signals using functional data analysis}. \textit{The Journal of the Acoustical Society of America}, \textit{108}, 1408–1420.

\phantomsection
\label{csl:60}Mao, G., Liu, X., Du, H., Zuo, J., \& Wang, L. (2015). {Way forward for alternative energy research: A bibliometric analysis during 1994{\textendash}2013}. \textit{Renewable and Sustainable Energy Reviews}, \textit{48}, 276–286.

\phantomsection
\label{csl:15}Martin, B. R. (1996). {The use of multiple indicators in the assessment of basic research}. \textit{Scientometrics}, \textit{36}, 343–362.

\phantomsection
\label{csl:31}MathWorks. (2016). \textit{{Distance between signals using dynamic time warping}}. Retrieved from \url{https://uk.mathworks.com/help/signal/ref/dtw.html?s_tid=gn_loc_drop}

\phantomsection
\label{csl:32}MathWorks. (2016). \textit{{k-medoids clustering}}. Retrieved from \url{https://uk.mathworks.com/help/stats/kmedoids.html}

\phantomsection
\label{csl:38}Mingers, J., \& Leydesdorff, L. (2015). {A review of theory and practice in scientometrics}. \textit{European Journal of Operational Research}, \textit{246}, 1–19.

\phantomsection
\label{csl:16}Narin, F., \& Hamilton, K. S. (1996). {Bibliometric performance measures}. \textit{Scientometrics}, \textit{36}, 293–310.

\phantomsection
\label{csl:26}Nau, R. \textit{{The logarithm transformation}}. Retrieved from \url{https://people.duke.edu/~rnau/411log.htm}

\phantomsection
\label{csl:64}Organization, W. I. P. (2015). \textit{{IPC IT Support - master files and by-products}}. Retrieved from \url{http://www.wipo.int/classifications/ipc/en/ITsupport/Version20150101/}

\phantomsection
\label{csl:8}Pye, D. (1978). \textit{{Nature and aesthetics of design}}. 1978.

\phantomsection
\label{csl:25}Ramsay, J. (2013). \textit{{Dissecting the U.S. Nondurable Goods Index}}. Retrieved from \url{http://www.psych.mcgill.ca/misc/fda/ex-goods-a2.html}

\phantomsection
\label{csl:35}Ramsay, J. (2013). \textit{{Smoothing the Nondurable Goods Index}}. Retrieved from \url{http://www.psych.mcgill.ca/misc/fda/ex-goods-b1.html}

\phantomsection
\label{csl:21}Ramsay, J., Hooker, G., \& Graves, S. (2009). \textit{{Functional Data Analysis with R and {MATLAB}}}. Springer New York.

\phantomsection
\label{csl:59}Rizzi, F., van Eck, N. J., \& Frey, M. (2014). {The production of scientific knowledge on renewable energies: Worldwide trends dynamics and challenges and implications for management}. \textit{Renewable Energy}, \textit{62}, 657–671.

\phantomsection
\label{csl:9}Schilling, M. A., \& Esmundo, M. (2009). {Technology S-curves in renewable energy alternatives: Analysis and implications for industry and government}. \textit{Energy Policy}, \textit{37}, 1767–1781.

\phantomsection
\label{csl:57}Schmoch, U. (1997). {Indicators and the relations between science and technology}. \textit{Scientometrics}, \textit{38}, 103–116.

\phantomsection
\label{csl:40}Service, U. K. D. (2017). \textit{{UK Data Service Statistics}}. Retrieved from \url{http://stats.ukdataservice.ac.uk/}

\phantomsection
\label{csl:10}Sood, A., \& Tellis, G. J. (2005). {Technological Evolution and Radical Innovation}. \textit{Journal of Marketing}, \textit{69}, 152–168.

\phantomsection
\label{csl:29}Twomey, B. \textit{{Simple Vs. Exponential Moving Averages}}. Retrieved from \url{http://www.investopedia.com/articles/trading/10/simple-exponential-moving-averages-compare.asp}

\phantomsection
\label{csl:49}Union, T. I. T. \textit{{ICT STATISTICS Home Page}}. Retrieved from \url{http://www.itu.int/en/ITU-D/Statistics/Pages/default.aspx}

\phantomsection
\label{csl:6}Utterback, J. M. (1994). {Mastering the dynamics of innovation: how companies can seize opportunities in the face of technological change Harvard Business School Press}. \textit{Boston, MA}.

\phantomsection
\label{csl:17}Verbeek, A., Debackere, K., Luwel, M., \& Zimmermann, E. (2002). {Measuring progress and evolution in science and technology--I: The multiple uses of bibliometric indicators}. \textit{International Journal of Management Reviews}, \textit{4}, 179–211.

\phantomsection
\label{csl:56}Verbeek, A., Debackere, K., Luwel, M., \& Zimmermann, E. (2002). {Measuring progress and evolution in science and technology - I: The multiple uses of bibliometric indicators}. \textit{International Journal of Management Reviews}, \textit{4}, 179–211.

\phantomsection
\label{csl:62}WIPO. (2009). \textit{{Patent-based Technology Analysis Report - Alternative Energy Technology}} (W. I. P. Organization, Ed.).

\phantomsection
\label{csl:0}Zhou, J., Xue, Z., Du, Z., Melese, T., \& Boyer, P. D. (1988). {Relationship of tightly bound {ADP} and {ATP} to control and catalysis by chloroplast {ATP} synthase}. \textit{Biochemistry}, \textit{27}, 5129–5135.

\phantomsection
\label{csl:0}Zinszer, K., Morrison, K., Verma, A., \& Brownstein, J. S. (2017). {Spatial Determinants of Ebola Virus Disease Risk for the West African Epidemic.}. \textit{PLoS Curr}, \textit{9}.

\phantomsection
\label{csl:44}on Clean Transportation, T. I. C. (2016). \textit{{European Vehicle Market Statistics Pocketbook}}. Retrieved from \url{http://eupocketbook.org/}

\end{document}