ABM is equally dependent on the interactions between agents as the behaviours that individual agents exhibit, leading to analogies with social and dynamic network analysis (topology) techniques [14]
- Developments made recently in Dynamic Network Analysis (DNA) enable modelling of the growth and reshaping of networks based on agent interaction processes [14]
- However, the behavioural rules for this agent-based simulation have currently only been formulated as a Systems Dynamics
model, as presented in Section VII, in order to first better
understand all the influences acting upon agents contemplating
a shift in paradigms.
- In terms of the core methodology itself, the diverse range of applications for ABM means that whilst
initially there was considerable scepticism about the value of
any results published based on these techniques, publications
based on ABM are now considered
methodologically sound. Nevertheless, verification of results
remains a crucial test of ABM publications, and in many
cases this verification has proved difficult to conclusively prove or disprove, especially when simulating hypothetical conditions,
or recreating conditions that may be subject to abductive
fallacies [29]. In most current applications, the verification of ABM approaches rests on the technology used beneath the
modelling (as this is where it can be most easily challenged)
and demonstrating that individual macroscopic behaviours are plausible in one-to-one interactions. From a methodological
and epistemological point of view, if the technology is well-built
(i.e. well-coded) then the methodology and epistemology
are now becoming more well respected for scientific research
results. However, a considerable amount of time still needs to
be designated to the documentation, programmatic testing, and
evaluation of case studies and scenarios before formal results
can be published based on ABM.
APPLICATION OF SYSTEM DYNAMICS AND CAUSAL
LOOP DIAGRAMS TO THE RESEARCH PROJECT
- Causal Loop Diagrams (CLDs) and System Dynamics form
a well-established method for exploring dynamic behaviours
in evolving systems when it is not necessary to incorporate
emergent influences, and provide a powerful tool for
recognising structures (feedback loops and influences) in nonlinear
systems from simple examination of the structure of
the models. CLDs in particular can provide a means of
translating qualitative statements (as maybe found in historical
descriptions or policy statements) into conceptual models that
can be related to mathematical expressions of quantitative
attributes, and so sits on the border between softer interpretivist
methodologies and harder functionalist approaches.
Additionally, the visual construction of CLDs provides a much
more intuitive way to generate mathematical descriptions of
complex phenomenon through the use of Stocks, Flows, and
rates as partial derivatives in the system, without the need for
detailed mathematical understanding (making it more accessible
to project stakeholders of all disciplines). Consequently the
System Dynamics modelling process as described by Sterman
[30] has been applied here (as elaborated in Tables VII and
VIII) to formulate an initial dynamic hypothesis relating the
transmission of the presumption for the need for technological
revolution across multi-level hierarchies to the influences
identified for paradigm shifts based on technological failures
(as visualised in Section VI). In this example (shown in Figure 14), CLDs and System Dynamics enable numerous
soft stocks such as the ’Agent’s confidence in its existing
paradigm’ to be incorporated directly into the formulation
of the dynamic hypothesis, assisting with determining the
susceptibility of a given agent’s presumptions to a range of
connected influences not always associated with quantitative
attributes. Soft stocks such as ’confidence’ can not be easily
built into other modelling constructs without the assistance of
CLDs and System Dynamics [31], so this implies a possible
need to use System Dynamics in conjunction with ABM in
future modelling activities to ensure that any behavioural rules
assigned to agents fully capture the behavioural dynamics shaping the causes of presumption. Additionally, the implicit
link between System Dynamics and partial derivatives provides
a means to dimensionally verify units and measurements
deployed in conceptual models, offering an additional check
of the rationality of the model proposed, and a capability
to identify the dimensionality of new parameters not conventionally
modelled. In practical terms, this approach could
therefore act as a means of triangulation for any conceptual
models generated by other methodologies, to improve the
robustness of the research analysis. As always limitations
apply to this methodology, which in this case are principally
relating to the modelling of emergent properties, and more
general restrictions to applications at a macroscopic level due
to the deterministic nature of the underlying calculations (see
[32]). However, care also has to be taken to ensure that data
is used to build System Dynamics models, as opposed to just
the application of ”judgement”, otherwise the rigour behind the
method is lost and conclusions generated are open to scrutiny
[1]. In this regard the formulation of the CLD shown in Figure
14 was based on the historical models described in the works
of Constant, Kuhn, and Hughes [9, 10, 12], along with key
findings identified from the literature reviews conducted by
the author.
Goodness-of-fit measures for comparing simulated and observed time series
Add references to VenSim summary statistics papers here (i.e. see 'Appropriate summary statistics for evaluating the historical fit of system dynamics models' and 'A VenSim model to calculate summary statistics for historical fit' by John Sternman and Rogelio Oliva respectively)
Wikipedia notes on statistical measures (100% RE-PHRASE, OR BETTER YET, RE-PHRASE AND TABULATE!!!!!):
R-Squared (R2): Coefficient of determination - Wikipedia
Correlation Coefficient ( r ) = N x ∑ XY - ( ∑ X ) ( ∑ Y ) / √ N x ( ∑ X2 - ( ∑ X )2 √ N x ( ∑ Y2 - ( ∑ Y )2 Coefficient of Determination ( r2 ) = r x r.
In
statistics, the
coefficient of determination, denoted
R2 or
r2 and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
[1]It is a
statistic used in the context of
statistical models whose main purpose is either the
prediction of future outcomes or the testing of
hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
[2][3][4] As explained variance[edit]
Suppose r = 0.7, meaning r2 = 0.49. This implies that 49% of the variability between the two variables has been accounted for, and the remaining 51% of the variability is still unaccounted for.
Interpretation[edit]
R2 is a statistic that will give some information about the
goodness of fit of a model. In regression, the
R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An
R2 of 1 indicates that the regression line perfectly fits the data.
Values of R2 outside the range 0 to 1 can occur where it is used to measure the agreement between observed and modeled values and where the "modeled" values are not obtained by linear regression and depending on which formulation of R2 is used. If the first formula above is used, values can be less than zero. If the second expression is used, values can be greater than one. Neither formula is defined for the case where y 1 = … = y n = y ¯ {\displaystyle y_{1}=\ldots =y_{n}={\bar {y}}}.
In all instances where
R2 is used, the predictors are calculated by ordinary
least-squares regression: that is, by minimizing
SSres. In this case
R2 increases as we increase the number of variables in the model (
R2 is
monotone increasing with the number of variables included—i.e., it will never decrease). This illustrates a drawback to one possible use of
R2, where one might keep adding variables (
Kitchen sink regression) to increase the
R2 value. For example, if one is trying to predict the sales of a model of car from the car's gas mileage, price, and engine power, one can include such irrelevant factors as the first letter of the model's name or the height of the lead engineer designing the car because the
R2 will never decrease as variables are added and will probably experience an increase due to chance alone.
This leads to the alternative approach of looking at the
adjusted R2. The explanation of this statistic is almost the same as
R2 but it penalizes the statistic as extra variables are included in the model. For cases other than fitting by ordinary least squares, the
R2 statistic can be calculated as above and may still be a useful measure. If fitting is by
weighted least squares or
generalized least squares, alternative versions of R2 can be calculated appropriate to those statistical frameworks, while the "raw"
R2 may still be useful if it is more easily interpreted. Values for
R2 can be calculated for any type of predictive model, which need not have a statistical basis.
Adjusted R-Squared (R2adjusted): Coefficient of determination - Wikipedia
The use of an adjusted
R2 (one common notation is R ¯ 2 {\displaystyle {\bar {R}}^{2}}, pronounced "R bar squared"; another is R adj 2 {\displaystyle R_{\text{adj}}^{2}} ) is an attempt to take account of the phenomenon of the
R2 automatically and spuriously increasing when extra explanatory variables are added to the model. It is a modification due to
Henri Theil of
R2 that adjusts for the number of
explanatory terms in a model relative to the number of data points.
[10] The adjusted
R2 can be negative, and its value will always be less than or equal to that of
R2. Unlike
R2, the adjusted
R2 increases only when the increase in
R2 (due to the inclusion of a new explanatory variable) is more than one would expect to see by chance. If a set of explanatory variables with a predetermined hierarchy of importance are introduced into a regression one at a time, with the adjusted
R2 computed each time, the level at which adjusted
R2 reaches a maximum, and decreases afterward, would be the regression with the ideal combination of having the best fit without excess/unnecessary terms. The adjusted
R2 is defined as
R ¯ 2 = 1 − ( 1 − R 2 ) n − 1 n − p − 1 = R 2 − ( 1 − R 2 ) p n − p − 1 {\displaystyle {\bar {R}}^{2}={1-(1-R^{2}){n-1 \over n-p-1}}={R^{2}-(1-R^{2}){p \over n-p-1}}}
where p is the total number of explanatory variables in the model (not including the constant term), and n is the sample size.
Mean Squared Error (MSE): Mean squared error - Wikipedia
The MSE is a measure of the quality of an estimator—it is always non-negative, and values closer to zero are better.
The MSE is the second
moment (about the origin) of the error, and thus incorporates both the
variance of the estimator and its
bias. For an
unbiased estimator, the MSE is the variance of the estimator. Like the variance, MSE has the same units of measurement as the square of the quantity being estimated. In an analogy to
standard deviation, taking the square root of MSE yields the root-mean-square error or
root-mean-square deviation (RMSE or RMSD), which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the
variance, known as the
standard deviation.
Root-mean-square-error (RMSE): Root-mean-square deviation - Wikipedia
The
root-mean-square deviation (RMSD) or
root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. The RMSD represents the
sample standard deviation of the differences between predicted values and observed values. These individual differences are called
residuals when the calculations are performed over the data sample that was used for estimation, and are called
prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a measure of
accuracy, to compare forecasting errors of different models for a particular data and not between datasets, as it is scale-dependent.
[1]Although RMSE is one of the most commonly reported measures of disagreement, some scientists misinterpret RMSD as average error, which RMSD is not. RMSD is the square root of the average of squared errors, thus RMSD confounds information concerning average error with information concerning variation in the errors. The effect of each error on RMSD is proportional to the size of the squared error thus larger errors have a disproportionately large effect on RMSD. Consequently, RMSD is sensitive to outliers.
[2][3]
Mean Absolute Error (MAE): Mean absolute error - Wikipedia
In
statistics,
mean absolute error (MAE) is a measure of difference between two continuous variables. Assume
X and
Y are variables of paired observations that express the same phenomenon. Examples of
Y versus
X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement.
Mean Absolute Percentage Error (MAPE): Mean absolute percentage error - Wikipedia
The
mean absolute percentage error (
MAPE), also known as
mean absolute percentage deviation (
MAPD), is a measure of prediction accuracy of a forecasting method in
statistics, for example in
trend estimation. It usually expresses accuracy as a percentage
The difference between At and Ft is divided by the Actual value At again. The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points n. Multiplying by 100 makes it a percentage error.
Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application
[1]
- It cannot be used if there are zero values (which sometimes happens for example in demand data) because there would be a division by zero.
- For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
- When MAPE is used to compare the accuracy of prediction methods it is biased in that it will systematically select a method whose forecasts are too low. This little-known but serious issue can be overcome by using an accuracy measure based on the ratio of the predicted to actual value (called the Accuracy Ratio), this approach leads to superior statistical properties and leads to predictions which can be interpreted in terms of the geometric mean.[1]
Mean Absolute Scaled Error (MASE): Mean absolute scaled error - Wikipedia
Rationale[edit]
The mean absolute scaled error has the following desirable properties:
[3]
- Scale invariance: The mean absolute scaled error is independent of the scale of the data, so can be used to compare forecasts across data sets with different scales.
- Predictable behavior as y t → 0 {\displaystyle y_{t}\rightarrow 0}: Percentage forecast accuracy measures such as the Mean absolute percentage error (MAPE) rely on division of y t {\displaystyle y_{t}}, skewing the distribution of the MAPE for values of y t {\displaystyle y_{t}} near or equal to 0. This is especially problematic for data sets whose scales do not have a meaningful 0, such as temperature in Celsius or Fahrenheit, and for intermittent demand data sets, where y t = 0 {\displaystyle y_{t}=0} occurs frequently.
- Symmetry: The mean absolute scaled error penalizes positive and negative forecast errors equally, and penalizes errors in large forecasts and small forecasts equally. In contrast, the MAPE and median absolute percentage error (MdAPE) fail both of these criteria, while the "symmetric" sMAPE and sMdAPE[4] fail the second criterion.
- Interpretability: The mean absolute scaled error can be easily interpreted, as values greater than one indicate that in-sample one-step forecasts from the naïve method perform better than the forecast values under consideration.
- Asymptotic normality of the MASE: The Diebold-Mariano test for one-step forecasts is used to test the statistical significance of the difference between two sets of forecasts. To perform hypothesis testing with the Diebold-Mariano test statistic, it is desirable for D M ∼ N ( 0 , 1 ) {\displaystyle DM\sim N(0,1)}, where D M {\displaystyle DM} is the value of the test statistic. The DM statistic for the MASE has been empirically shown to approximate this distribution, while the mean relative absolute error (MRAE), MAPE and sMAPE do not.[2]
Non seasonal time series[edit]
For a non-seasonal time series,
[5] the mean absolute scaled error is estimated by
M A S E = 1 T ∑ t = 1 T ( | e t | 1 T − 1 ∑ t = 2 T | Y t − Y t − 1 | ) = ∑ t = 1 T | e t | T T − 1 ∑ t = 2 T | Y t − Y t − 1 | {\displaystyle \mathrm {MASE} ={\frac {1}{T}}\sum _{t=1}^{T}\left({\frac {\left|e_{t}\right|}{{\frac {1}{T-1}}\sum _{t=2}^{T}\left|Y_{t}-Y_{t-1}\right|}}\right)={\frac {\sum _{t=1}^{T}\left|e_{t}\right|}{{\frac {T}{T-1}}\sum _{t=2}^{T}\left|Y_{t}-Y_{t-1}\right|}}}
[3]where the numerator
et is the
forecast error for a given period, defined as the actual value (
Yt) minus the forecast value (
Ft) for that period:
et =
Yt −
Ft, and the denominator is the
mean absolute error of the one-step "
naive forecast method" on the training set,
[5] which uses the actual value from the prior period as the forecast:
Ft =
Yt−1
[6]
Defining payoff functions based on the Integral Squared Error means that large errors will tend to be eliminated quickly, as these are heavily penalised (due to the square of a large error being much bigger), but smaller errors often persist for a long period of time (as these are not as heavily penalised). This may partially explain why some degree of offset is fairly often tolerated in chapter 6, but that take-off points are more closely respected???
Method selection
Based on the technology classification problem considered, the bibliometric data available, and the methods discussed in sections \ref{650363} to \ref{875755} the following methods have been selected for use in this analysis:
Technology Life Cycle stage matching process
For those technologies where evidence for determining the transitions between different stages of the Technology Life Cycle has either not been found or is incomplete, a nearest neighbour pattern recognition approach has been employed based on the work of Gao \cite{Gao_2013} to locate the points where shifts between cycle stages occur.
In this instance a supervised learning approach is taken as the well-established nature of the Technology Life Cycle model is widely recognised to form a sensible basis for classifying technological maturity, so there is no need to establish the validity of the categories being assigned. Equally, the nearest neighbour approach is commonly used as an industry standard, so no further development is proposed here for this study.
OR
However, for the specific technologies considered in this study, literature evidence has been identified for the transitions between stages, and so the nearest neighbour methodology is not discussed further here.
Identification of significant patent indicator groups
In order to identify those bibliometric indicator groupings that could form the basis of a data-driven technology classification model a combination of Dynamic Time Warping and the 'PAM' variant of K-Medoids clustering has been applied in this study. For the initial feature alignment and distance measurement stages of this process, Dynamic Time Warping is still widely recognised as the classification benchmark to beat (see section \ref{446824}), and so this study does not look to advance the feature alignment processes used beyond this. Unlike the Technology Life Cycle stage matching process which is based on a well-established technology maturity model, this study is assuming that a classification system based on the modes of substitution outlined in section \ref{771448} is not intrinsically valid. For this reason an unsupervised learning approach has been adopted here to enable human biases to be eliminated in determining whether a classification system based on presumptive technological substitution is valid or not, before subsequently defining a classification rule system. In doing so this additionally means that labelling of predicted clusters can be carried out even if labels are only available for a small number of observed samples representative of the desired classes, or potentially even if none of the observed samples are absolutely defined. This is of particular use if this technique is to be expanded to a wider population of technologies, as obtaining evidence of the applicable mode of substitution that gave rise to the current technology can be a time-consuming process, and in some cases the necessary evidence may not be publicly available (e.g. if dealing with commercially sensitive performance data). As such, clustering can provide an indication of the likely substitution mode of a given technology without the need for prior training on technologies that belong to any given class. Under such circumstances this approach could be applied without the need for collecting performance data, providing that the groupings produced by the analysis are broadly identifiable from inspection as being associated with the suspected modes of substitution (this is of course made easier if a handful of examples are known, but means that this is no longer a hard requirement).
The 'PAM' variant of K-Medoids is selected here over hierarchical clustering since the expected number of clusters is known from the literature, and keeping the number of clusters fixed allows for easier testing of how frequently predicted clusters align with expected groupings. Additionally, a small sample of technologies is evaluated in this study, and as a result computational expense is not likely to be significant in using the 'PAM' variant of K-Medoids over hierarchical clustering approaches. The Euclidean distance metric is subsequently selected for the K-Medoids clustering in order to ensure consistency with the Dynamic Time Warping measures available (see \cite{dtw}). Amplitude normalisation of the time series considered also ensures that Euclidean distance measures are not inadvertently biased by observations of high or low values (see section \ref{396009}). It is also worth noting that by evaluating the predictive performance of each subset of patent indicator groupings independently it is possible to spot and rank commonly recurring patterns of subsets, which is not possible when using approaches such as Linear Discriminant Analysis which can assess the impact of individual predictors, but not rank the most suitable combinations of indicators.
Ranking of significant patent indicator groups
As the number of technologies considered in this study is relatively small, exhaustive cross-validation approaches provide a feasible means to rank the out-of-sample predictive capabilities of those bibliometric indicator subsets that have been identified as producing significant correlations to expected in-sample technology groupings. As such, leave-p-out cross-validation approaches are applied for this purpose, whilst also reducing the risk of over-fitting in the following model building phases.
Model building
The misalignment in time between life cycle stages relative to other technologies can make it difficult to identify common features in time series. This is primarily because this phase variance risks artificially inflating data variance, skewing the driving principal components and often disguising underlying data structures \cite{Marron_2015}. Consequently, due to the importance of phase variance when comparing historical trends for different technologies, and the coupling that exists between adjacent points in growth and adoption curves, functional linear regression is selected here to build the technology classification model developed in this study (see section \ref{875755}).
Sensitivity of technology adoption to chosen modelling parameters
Whilst statistical approaches are well-suited to detecting underlying correlations in historical and experimental datasets, this on it's own does not provide a detailed understanding of the causation behind associated events. Equally, statistical methods are not generally well suited to predicting disruptive events and complex interactions, with other simulation techniques such as System Dynamics and Agent Based Modelling performing better in these areas (see chapter 2). Accordingly, in order to identify causation effects and test the sensitivity of technological substitution patterns to variability arising from real-world socio-technical features not captured in simple bibliometric indicators (such as the influence of competition and economic effects), the fitted regression model is evaluated in a real-time system dynamics environment.
Method limitations
Although precautions have been taken where available to ensure that the methods selected for this study address the problem posed of building a generalised technology classification model based on bibliometric data in as rigorous a fashion as possible, there are some known limitations to the methods used in this work that must be recognised. Many of the current limitations stem from the fact that in this analysis technologies have been selected based on where evidence is obtainable to indicate the mode of adoption followed. As such the technologies considered here do not come from a truly representative cross-section of all industries, so it is possible that models generated will provide a better representation of those industries considered rather than a more generalisable result. This evidence-based approach also means that it is still a time-consuming process to locate the necessary literature material to be able to support classifying technology examples as arising based on one mode of substitution or another, and to then compile the relevant cleaned patent datasets for analysis. As a result only a relatively limited number of technologies have been considered in this study, which should be expanded on to increase confidence in the findings produced from this work. This also raises the risk that clustering techniques may struggle to produce consistent results based on the small number of technologies considered. Furthermore, any statistical or quantitative methods used for modelling are unlikely to provide real depth of knowledge beyond the detection of correlations behind patent trends when used in isolation. Ultimately some degree of causal exploration, whether through case study descriptions, system dynamics modelling, or expert elicitation will be required to shed more light on the underlying influences shaping technology substitution behaviours.
Other data-specific issues that could arise relate to the use of patent searches in this analysis and the need to resample data based on variable length time series. The former relates to the fact that patent search results and records can vary to a large extent based on the database and exact search terms used, however overall trends once normalised should remain consistent with other studies of this nature (this point is addressed in more detail in section XX). The latter meanwhile refers to the fact that functional linear regression requires all technology case studies to be based on the same number of time samples. As such, as discussed in section \ref{875755}, linear interpolation is used as required to ensure consistency on the number of observations whilst possibly introducing some small errors which are not felt to be significant.
Selected data sources
Three types of data sources are considered in this study, relating to either patent or publication data (i.e. bibliometric sources), which are subsequently coupled with technology adoption data to enable the impact of different modes of substitution to be investigated:
Patent data
Patent data has been sourced from the Questel-Orbit patent search platform in this analysis. More specifically, the full FamPat database was queried in this study, which groups related invention-based patents filed in multiple international jurisdictions into families of patents. This platform is accessed by subscribers via an online search engine that allows complex patent record searches to be structured, saved, and exported in a variety of formats. A selection of keywords, dates, or classification categories are used in this search engine to build relevant queries for a given technology (this process is discussed in more detail in section \ref{335937}). The provided search terms are then matched in the title, abstract, and key content of all family members included in a FamPat record, although unlike title and abstract searches, key contents searches (which include independent claims, advantages, drawbacks, and the main patent object) are limited to only English language publications. Some of the core functionalities behind this search engine are outlined in \cite{Questel_Orbit_2000}.
Publication data
Journal article and publication records used in this analysis are based on extracted search results from the Web of Science (WoS) citation indexing service provided by Clarivate Analytics (previously Thomson Reuters). Web of Science was originally established based on the work of Eugene Garfield, who identified the relevance of citations and subsequently developed the idea of the Science Citation Index (SCI) in the 1950's as a database for storing these records, along with the Institute for Scientific Information (ISI) as an organisation setup to maintain this information. Whilst not originally intended for research evaluation, but rather for aiding researcher's in finding relevant work more effectively, the SCI was later joined by the Social Sciences Citation Index (SSCI), and subsequently the Arts & Humanties Citation Index (A&HCI) in the 1970's. After being acquired by the Thomson Corporation, this collection of indexes was converted into the present day Web of Science, which is currently reported to hold details of over 100 million records dating from 1900 onwards, covering more than 33,000 journals, 50,000 books, and 160,000 conference proceedings. As such, this comprises the largest collection of scholarly articles globally \cite{Mingers_2015,WoS_facts}.
In a very similar fashion to the Questel-Orbit platform, the online Web of Science search engine relies on a series of keywords and Boolean operators to define search terms that are then matched in the title, abstract, and key content of the records in the database.
Technology adoption data
Adoption data for the technologies investigated is taken from a wide variety of sources due to the broad scope of the technology domains considered. Where possible, global technology sales and shipment values have been used to determine the overall market share of each technology at a given time, although in some cases data values have been imputed to fill gaps in time series (this is stated where this has been applied, as well as the method of deriving imputed values). Furthermore, the preference has been to extract statistical data directly from international agencies such as the UN, World Bank, International Energy Agency, International Council on Clean Transportation, International Telecommunication Union, and Eurostat when available, as these organisations generally present the most consistent representation of the technologies considered when taking into consideration regional development trends. In many cases, this information was accessed via the UK Data Service \cite{UKDS_stat}.
A brief description of each data source used for technology adoption data is given in Table \ref{table:data_sources_for_technology_adoption_data}: