The study of social networks stretches back into the sociological tradition of the early \(20^{th}\) century. Much of this early work was focused around very small networks, and was merely descriptive in nature. Many sociological researchers focused their efforts on simple network visualisations and analysis of key influencers, and did not venture far into the mathematical study of networks. However, they offered insights into how the shape or structure of a social network can affect the exchange of information.
A social network can be described as mathematical graph consisting of a set of nodes connected by a set of edges. Many of the developments in the modern, data driven study of social networks have progressed in line with developments in graph theory. The origins of graph theory date back to Euler in the \(18^{th}\) century, who’s solution to the Konigsberg bridge problem is considered to be the first theorem in graph theory. However, it wasn’t until Erdős et al. (1957) introduced the mathematics of random graphs that the field started to be applied to social networks. In the random graph model, nodes are connected with uniform probability to other nodes. As the connection probability increases, a connectivity phasetransition is observed. This model introduced the application of mathematics to the study of social networks, and allowed for analytical solutions to questions around the connection structure of the network to be answered.
While the random graph model allows for mathematical theory to be derived, it does not explain many properties observed in real world social networks. Perhaps the most famous such property is Stanley Milgram’s ‘six degrees of separation’, also known as the ‘smallworld’ effect (Travers et al., 1967). The concept is that realworld social networks have a surprisingly short diameter, i.e. you can reach any other person in the world in a small number of steps. Further to the small world property, in many social networks the distribution for the number of connections per node follows a power law, rather than the uniform distribution assumed by random graph theory. This latter property is also observed in certain physical systems. Physicists addressed these properties by developing what is now known as the field of complex systems.
The similarities between complex physical systems and social networks sparked the interest of a group of physicists who had previously worked on complex systems. Firstly, Watts et al. (1998) published their smallworld model which produces the observed small network diameters. Soon after, Barabasi et al. (1999) addressed the common powerlaw degree distribution property by introducing their scalefree network model. Both of these models can be considered to be multiagent representations of social networks, with simple rule sets. Following from these models, a wide variety of research has been published regarding the properties and applications of social networks. However, there is still an open research gap regarding the set of rules governing these models.
The study of network topology deals with aspects of the network connection structure, and has seen a great deal of interest since the work of Watts et al. (1998) and Barabasi et al. (1999). The topological analysis of social networks has formed around the broad categories of macro network properties, node and edge influence, and complex community structure.
Firstly, there are a variety of properties that can be measured and classified for a whole network. These include measures such as a network’s diamater, density, clustering, and degree distribution. Analysis of these properties is important with respect to creating realistic models of social networks, and certainly directed the development of the smallworld and scalefree models.
Much of the early research in social networks focused around understanding the influence of nodes, and particularly nodes in positions of information brokerage. With the tools of graph theory, structural influence can be defined using a variety of measures known as centrality measures. Centrality measures attempt to quantify the relative influence of a node over a network. One of the simplest of these is the degree of a node, defined as the number of edges that a node has. There are many common centrality measures used in network science, and they carry useful information regarding network structure.
Soon after the smallworld and scalefree models were published, a multidisciplinary group of researchers started to approach social networks with the aid of computational algorithms. One of the first applications of this approach was to partition a network into a set of densely connected subsets. This is known as the problem of community detection, and a large body of research has been published to address it. Some of the most common algorithms are based on centrality measures, and seek to either divide communities at edges with low influence, or center communities at nodes with high influence. For example, Girvan et al. (2002) proposed an algorithm that defines communities by iteratively removing nodes with the lowest betweenness centrality scores. Further to the development of community detection algorithms, the analysis of hierarchical community structure inherent within many social networks has generated a lot of interest. An example of this is the classification of scaling laws at different levels of the community structure, and the development of theory as to why this occurs.
Understanding and desribing the full topology of a network is still a challenging and unresolved task, due to the large number of dimensions and interconnectivity of networks. Methods to describe and express the information contained in a network’s topology are therefore critical to modelling processes taking place.
The concept of diffusion on a social network is important due to its many practical applications. Network diffusion models describe processes taking place on the network, such as the flow of information or spread of an infectious disease. Some well established methods for modelling diffusion involve dynamical systems models, which describe the interaction between groups within a population. Examples include the SusceptibleInfectedSusceptible (SIS) model for the spread of infectious diseases, and the Bass diffusion model of innovation. These dynamical systems techniques work well in cases with limited data as they do not require explicit knowledge of the network topology.
The other key approach is to model each individual node as an agent which acts subject to predefined rules. This multiagent system approach generally requires more information about the underlying connection structure, but can model complex and interdependent behaviour. Similarly to multiagent models of evolving social networks, such as the smallworld and scalefree models, defining the rule sets is still an open research area. Interesting global behaviours emerge out of the multiagent approach, such as information rapidly diffusing through large parts of the network at critical parameter values and conditions. This phenomena is generally well known by users of OSNs as the viral spread of content and memes. Research into critical diffusion within a social network is an active area of study given the recent availability of OSN data.
An understanding of the diffusion dynamics of a social network is particularly pertinent in the modern age of online social media. We are constantly immersed in content distributed through our online networks. At some level, we would all have experienced the phenomena of viral content. The related concept of a meme, and particularly their selfpropagation, can be traced back to Dawkins (1976). Dawkins defined a meme as a selfreplicating unit of cultural information or thought, passed from person to person by means of imitation. In a social network, this can be as simple as sharing or retweeting a piece of content, but can also involve more complex ideas communicated through natural language.
Recently, the question of whether we can predict the emergence of memes in a social network has received a good deal of interest. One may consider properties of the content itself, and the context within which it arises, as key predictors of it’s viral diffusion. Also, how the content diffuses through a network has been shown to be crucial. The structure of the network itself can therefore be critical in determining the velocity and extent of diffusion. For example, Weng et al. (2014) proposes using a supervised learning approach to predict which memes spread extensively with features derived from the network and community structure.
Viral diffusion and the emergence of memes have been approached from both a dynamical systems perspective, where criteria for viral spreading are derived, and also from a multiagent representation. In this latter approach, we attempt to set the agent rules in line with empirically observed behaviour in social networks. Simulation is used to determine results for diffusion velocity and extent. However, with both approaches to modelling the emergence of memes, realistic network topologies must be assumed, as well as initial conditions for diffusion.
In this dissertation, we adopt a multiagent approach to modelling the emergence of memes, taking into account realistic network topologies. To achieve this, we empirically derive the agent rule set from real world online social network data, incorporating both heterogeneous content and users. We validate our model by comparing the simulated viral diffusion dynamics and meme emergence with cases from real world data.
The thesis outlined in this dissertation is that we can simulate the emergence of memes within a social network. We arrive at this by first developing a framework to understand the relationships between network topology, growth and diffusion. We then develop a set of methods to infer additional information about both nodes and edges with data from real world social networks Facebook and Twitter. This leads to a definition of OSN users heterogeneous across content preferences and network position. We adopt a multiagent system approach to simulating real world social network activity using empirically derived agent rule sets. Finally, we provide analysis of our simulated meme emergence, and compare to real world activity.
research questions are as follows:
How does the topology of a social network influence future link formation and content diffusion?
What factors are predictive of heterogeneous online social network users engaging with different types of content?
To what extent can we simulate the emergence of memes in a social network?
This research is of interest to a wide range of stakeholders due to the large number of users of OSNs and the multidisciplinary nature of complex networks research. A better understanding of the interplay between network topology, link formation and content diffusion has a wide variety of applications. For instance, better link prediction models can improve the accuracy of connection recommendations in online social networks. More accurate content diffusion models, incorporating network topological effects, can benefit a large tranche of digital media and advertising companies as they can predict more accurately which content is more likely to go viral. Finally, an accurate methodology to predict the emergence of memes in a social network can benefit people studying human social behaviour, as well as online social network users, platform providers, and digital marketers and advertisers. \parThekey stakeholders of this dissertation belong to the following groups:
Digital Marketers and Advertisers
Online Social Network Providers
Online Social Network Users
Social Network Researchers
Sociologists and Anthropologists
The key aim of this research is to show how we can simulate the extent with which different types of memes will diffuse in different networks. Throughout this thesis, we show how this can aim can be achieved through constructing a multiagent system which represents both the topology and activity dynamics of a real world social network. The aims of this thesis are as follows:
Show how the topology of a social network affects future link formation and content diffusion.
Show how we can model heterogeneous online social network users engaging with different types of content.
Show how the viral diffusion of memes can be simulated in a social network.
The key objective of this research is to deliver a computational framework that can simulate the emergence of memes in a social network. This objective is achieved by first delivering modelling approaches to predict future link formation and content diffusion using measures derived from the network topology. We then deliver a statistical framework to model the likelihood of different types of OSN users engaging with heterogeneous content. Finally, we deliver a modelling approach to predict the virality of heterogeneous content in a social network, using a multiagent system incorporating results from the first two objectives. The objectives of this thesis are as follows:
Deliver modelling approaches to predict future link formation and content diffusion using features derived from the network topology.
Deliver a statistical framework to predict which users will engage with different content, using a variety of graph topological measures and other features.
Deliver a multiagent system to simulate and predict the emergence of memes in a social network.
This dissertation offers several contributions to knowledge. Firstly, we explore the influence of network topology on the formation of future links and simple content diffusion. We extend this method by incorporating link strength, sign and further natures. Next, we propose a novel social network model based on the topological influences in real world networks, and provide an extensive review of the literature in this area.
Working with natural language captured from Facebook and Twitter activity, we then deliver a framework to infer key information regarding nodes and edges in the respective networks, and provide a model to predict which users engage with which topics and content. Finally, we use all of this research to construct a multiagent system to simulate the diffusion of heterogeneous content across heterogeneous networks, and analyse the emergence of memes. \parTheknowledge contributions of this research are categorised as below:
A novel approach to link prediction in a social network using features derived from the topology of the network.
A novel approach to classify the properties of a network using link prediction importance measures.
A novel approach to extend the link prediction problem to incorporate link strength and sign, using real world data.
A novel multiagent diffusion model for simple content sharing incorporating the influence of network topology, simulated on artificial and real world Twitter network data.
A review of the research regarding social network models, and a novel multiagent model of a growing social network.
A novel modelling approach to estimate the likelihood for nodes in a social network to engage with heterogeneous content, using topological measures and inferred measures from real world Twitter data.
A novel multiagent methodology to simulate and quantify the emergence of viral content, or memes, in a social network.
Research into the interdependencies regarding social network structure and diffusion dynamics is important and of use to a wide range of stakeholders. Particularly, the emergence of memes is a topic important to many stakeholders, as the phenomena has the potential to engage large volumes of people. Social networks are deeply embedded in many aspects of modern society, from the interactions of agents in financial markets to the structure and dynamics of high school cliques, large organisations and even nations themselves. Some applications of this research include more effective targeting and content delivery for digital advertising, as well as a better user experience on online social media platforms. Sociologists and complex systems scientists may also apply this research to other data sets and networks. Further ahead, we believe that this research may be used to inform the design of organisational and social structures which optimise the flow of information, and may lead to more effective social policies at a government level.
The research questions and associated knowledge contributions benefit the stakeholders listed in many ways. The first research question asks how a network’s topology can be used to predict links and influence diffusion dynamics. We propose using a link prediction supervised learning model with a graph topological feature set. This has the advantaget that it can be applied to any network where the connection structure is known, so can be used as a benchmark in further research. This benefits complex and social network researchers by providing a comparision model. With regards to network diffusion, it is well known that a network’s structure can dramatically affect the diffusion dynamics (Newman, 2003). However, the exact nature of this dependency is still an open question. We aim to answer this question by quantitatively showing how the results of diffusion simulations using multiagent systems vary with different network topologies. Answering this research question extends the existing literature on this topic, so social network researchers largely benefit.
The second question asks which factors are predictive of heterogeneous online social network users engaging with different types of content. This question is particularly significant to online social media platfoms, content creators, and digital advertisers. An answer to this question will provide these stakeholders with an increased understanding of the different types of OSN users, and how their preferences for media and content differ. This can be used to refine the targeting for advertisements, friend recommendations, and the type of content used in marketing material. The statistical framework proposed answers this question by learning a function which expresses the influence and importance of the factors on OSN users engaging content. This function can also be used to partition OSN users into heterogeneous groups. This will benefit complex system scientists, data scientists and any other researchers interested in apply the techniques to real world data. Also, sociologists and anthropologists will benefit from an increased understanting of the types of people using online social networks, and their content preferences.
The third question asks how accurately we can simulate the emergence of memes in a social network. We answer this question by modelling viral diffusion with a multiagent system, where the agent rules are empirically derived from real world networks. Simulated results regarding viral diffusion are then compared to real world cases. This work benefits all of the stakeholders listed. Digital marketers and advertisers will be provided with an approach to test which types of content will go viral, controlling for how they are spread in a network. Online social network providers will benefit from a framework to compare activity taking place in their network with a simulation, and can use this to create more accurate content recommendation systems. Users of online social networks will benefit from having better content recommended to them. Social network researchers will have a novel and comprehensive study simulating real world activity in social networks, and sociologists and anthropologists benefit from an empirical study of the crowd behaviour using realistic agents representing individuals in OSNs.
The literature review presented here aims to provide a comprehensive survey of the key areas of research into social networks. Throughout this review, it is shown that the research question of how we can simulate or predict the emergence of memes in social networks is still open. Furthermore, there are many research gaps regarding the interdependencies between network topology and diffusion. Further research is also required to better understand network topology, particularly of large, real world online social networks.
The body of literature in this field is divided into four key areas of active research. Section \ref{lit_topology} delivers a review of measures and methods to classify the topological structure of a network, including techniques to identify influential nodes and approaches to detect community clustering structure. Some results on hierarchical community structure are also discussed. Section \ref{lit_SNM} provides a review of the literature concerning evolving models of network topology, and section \ref{lit_linkprediction} introduces the problem of link prediction and the main methods available. Aection \ref{lit_diffusion} presents two common approaches to modelling diffusion on social networks, followed by a review of the research regarding the prediction and simulation of memes.
A network or graph, labelled by \(\mathcal{G}\), is a set nodes (also known as vertices, agents, actors), \(\mathcal{V}\) and a set of dyadic connections (edges, links, ties), \(\mathcal{E}\) which join the nodes. The connections can be directed or undirected, and a variety of connection types can be chosen to give different graphs. In the language of graph theory, the connection structure between nodes of a graph can be represented by the adjacency matrix. For a finite graph \(\mathcal{G}\) with \(n\) nodes, the adjacency matrix is the \(n\times n\) matrix with elements \(a_{ij}\), which are nonzero if there is an edge between nodes \(i\) and \(j\). We then define properties of graphs such as vertex degree, or the number of edges for a given vertex, and the degree distribution of all vertices in the network. In a directed network, nodes can have both indegree and outdegree in line with the edge direction. There are also many other properties of networks including distance (the number of edges traversed between nodes), network diameter (the maximum distance of the network), clustering and centrality measures, and other concepts such as paths. In particular, there are many definitions of centrality measures, such as betweenness, closeness and eigenvector centrality. Network models can be applied to a wide range of phenomena across multiple disciplines including physics, biology and computer science, however the focus for this review is on social networks.
The analysis of social networks began within the sociological discipline. Much of the work was descriptive and involved social science methodologies such as carefully designed behavioural surveys and the visual analysis of small network structures (Freeman, 2011). Most notable was the work of Travers et al. (1967), whose study produced empirical evidence of the ‘smallworld’ effect: the popular concept of ‘six degrees of separation’, where the the network can be traversed in a surprisingly short number of steps. However, a new model of how social networks form was required to explain this result. Taking methods from the analysis of the physics of critical phenomena, Watts et al. (1998) proposed a network model where all nodes start with limited local edges to neighbouring nodes, then additional edges are assigned in a random fashion. The resultant network is highly clustered, meaning that nodes are densely connected together, and the network exhibits small characteristic path lengths; the ‘smallworld’ effect.
In addition to the smallworld effect, many real world social networks follow a power law degree distribution which the model of Watts et al. (1998) does not explain. Such a degree distribution is evident where the probability of a node having degree \(k\) follows a power law, given by \(P(k)\sim k^{\gamma}\), where \(\gamma\) is a constant. To address this, Barabasi et al. (1999) introduced the concept of preferential attachment, where nodes form edges preferentially to nodes that are already well connected, rather than randomly or locally to their neighbours. The resultant network, which they coined ‘scalefree’, follows a power law degree distribution and very closely matches the degree distribution of documents connected via hyperlinks on the World Wide Web. Although this model explains many real world networks, many others exhibit degree distributions other than power law and can be characterised by complex topological structure and community clustering (Newman, 2003).
Much of the more recent work relating to the analysis of network structure and topology can be categorised into two key areas, namely the idenfitication of influencial nodes according to a range of measures, and the detection of community structures within a network. Some interesting results have also come from analysing scaling properties of hierarchical networks.
A key area of research with regard to the topology of social networks is the identification of influencer nodes. Influencers in a social network can be defined as nodes that contribute significantly to the diffusion properties of the network, or are situated in positions of brokerage. For example, a node that has high degree is more likely to spread a virus, if infected, to a greater number of people than a node with low degree. This area is important to a range of applications that aim to better understand spreading behaviour, including innovation diffusion, viral marketing and epidemic control. In practice, commonly applied methods to identify influencers each convey distinct information for nodes in social networks. However, there is no consensus in the research community, so consideration must be given regarding which method to use (Pei et al., 2013). In particular, graph centrality measures provide a simple approach to score the influence of nodes over a network, and they form the basis for many methods. Furthermore, centrality measures have been shown to be useful as predictors in the link prediction problem due to the information they carry regarding the network topology and position of each node.
One of the earliest methods proposed for influencer identification is based on betweenness centrality, which measures the influence a node has over the spread of information in the network. It is calculated for a node as the fraction of shortest paths between other node pairs that pass through the chosen node. Newman (2005) proposed an influencer identification method that relaxes the shortest path restriction, instead using random walks to give how often the chosen node will fall on alternate paths between nodes. However, nodes that lie on shorter paths are effectively weighted up as there will be fewer alternative paths. Tested on a range of networks, this method generally gives more accurate solutions than shortest path betweenness measures, although the algorithm has a high computational cost. This cost is proportional to the number of nodes cubed, but can be overcome through utilising more optimal algorithms (Brandes, 2001; Brandes, 2008). As an alternative to the random walk calculation, Mantrach et al. (2010) propose a more sophisticated approach based on efficiently estimating a family of covariance measures between nodes on a directed, weighted graph. The key idea is that two nodes are correlated if they frequently both lie on the same trajectory. However, more work is required here to integrate these covariance measures into an efficient algorithm for influencer identification.
Another popular approach to identifying influencers is to use eigenvector centrality, which was originally applied in sociology as a measure of the influence of an individual in a social network (Pei et al., 2013). The eigenvector centrality for a node takes into account the centrality of its neighbours, as well as itself. This method is what Google’s well known search algorithm for the World Wide Web, PageRank, utilises to rank web pages according to the hyperlink network. However, Lu et al. (2011) apply PageRank to a dynamic online community in an attempt to identify community leaders and find it is not as accurate in a quickly evolving social network since the full model requires frequent recalibration. The authors develop their own eigenvector centrality method to identify influential nodes, LeaderRank, which selfadjusts to new information and outperforms PageRank in terms of accuracy. Another application of eigenvector centrality was by Weng et al. (2010) who develep and apply a method called TwitterRank to estimate the influence of users on Twitter. TwitterRank takes both the topic similiarity and connection structure betweeen users into account, and on a sample of data it outperformed the algorithm that Twitter uses.
The last approach worth mentioning here is the \(k\)shell decomposition of the network, which differs from those mentioned in that it identifies sets of influencer nodes. It is been observed that under many circumstances, the most influential nodes may not necessarily be identified as those most connected or central to the network. This approach groups nodes into sets of \(k\)shells, where each \(k\)shell is a subgraph containing nodes with degree of at least \(k\) (Pei et al., 2013; Probst et al., 2013; Basaras et al., 2013). In these cases, the best spreaders can be effectively identified as those situated in the core of the network, as given by \(k\)shell decomposition (Kitsak et al., 2010). However, this approach has a number of drawbacks, including a high computational cost. The method also often fails to establish a monotonic relationship between \(k_{s}\) and the total infected area, which can be given by degree based methods. Basaras et al. (2013) address this by creating a method that combines betweenness centrality and \(k\)shell decomposition, and show that this hybrid method is more effective at identifying influential nodes and has less computational overhead than either of the component methods.
Another key area for investigation in social network analysis is the study of how communities are structured. The term community is commonly used to refer to natural divisions of a network into densely connected subgroups, which are heterogeneous globally but homogeneous locally, according to some shared properties or roles in the graph. Fortunato (2010) provides a concise goal of this area when he comments that “the aim of community detection is to identify the modules and, possibly, their hierarchical organization, by only using information encoded in the graph topology” \citeyearpar{[}p. 78]Fortunato2010. Although community structure has been the topic of much research within the social sciences, the computerization of network analysis has allowed for the development of algorithmic approaches to community detection.
Amongst the first seminal works in this area was an algorithm for community classification by Girvan et al. (2002), often referred to as the GN algorithm. The algorithm is divisive, so starts with one wholly connected community and iteratively divides the community by removing vertices with the lowest edge betweenness measures, which are recalculated at each step. Edge betweenness is similar to the betweenness centrality measures used for influencer identification, and represents how central an edge is. It is defined for edge \(i\) by the number of shortest paths between vertices that run along \(i\). The concept is that edges connecting communities will have high edge betweenness, so removing them will reveal the community structure. The authors also propose a measure for how good the community split is, the ‘modularity’ of the community partitions. This measure is related to the fraction of edges that fall within the same community. One can easily select the best community partitions by taking the split at the iteration that maximizes the modularity. Figure \ref{modularity} provides an example of the dendrogram of hierarchical community partitions, where the dotted red line represents the point of maximum modularity. The method produced reliable results when applied to real world network data, including social networks and the World Wide Web, as well as synthetically generated data where the community structure was known. However, the major detriment of the GN algorithm is that it has a high computational cost, which limits its application to large networks. The algorithm is also only applicable to undirected networks with unweighted edges.
To address the limitations in terms of computational cost of the GN algorithm, a number of methods have been proposed (Fortunato, 2010). One such algorithm is based on a hierarchical agglomeration (Clauset et al., 2004). The concept behind this approach is to gather vertices into groups such that there is a higher density of edges within groups than between groups. Optimisation of the modularity concept as defined by Girvan et al. (2002) will then identify the best community split. This algorithm runs in approximately linear time, relative to the size of the network, so allows for large scale community detection to be run efficiently. Indeed, the authors apply the method to a large network of product copurchasing associations from the online retailer Amazon, and find good results. Despite this success, it is unclear how the results compare to other approaches as the algorithm is not tested and benchmarked on synthetically generated data where the community structure is known. As a side note, the authors comment that when a network is partitioned to the point of maximum modularity, the distribution of community sizes follows a power law. The reason behind this is an area for further research, but it is posited that the result could be a consequence of the sociology of the network, i.e. interest in various topics follows a power law. We will return to this point in Section \ref{hierarchy} where the research regarding hierarchical community structures is reviewed.
Following from these earlier community detection algorithms, a variety of methods have been developed, each with their advantages and disadvantages, however many give noticeably similar results. Newman (2013) addresses this by showing that under transformations using the eigenvectors of the adjacency matrix, termed spectral method formulations, with careful choice of free parameters, the communities classified by three different algorithms are identical. The theoretical results are verified over both real world data and computer generated data, however, the methods proposed are limited to discerning a dichotomous community structure only. Nevertheless, the analysis is an important step towards consolidating the myriad methods that have been developed for community detection by proposing equivalence between approaches.
All of the papers reviewed above are limited in that they only consider static networks, i.e. networks where the topology is fixed in time. Clearly, for these approaches to be of practical use in social networks the dynamics of communities must be considered. Pallo et al. (2007) outline the magnitude of this issue by stating that “our knowledge of the mechanisms governing the underlying community dynamics is limited, but is essential for a deeper understanding of the development and selfoptimization of society as a whole” (Pallo et al., 2007). They open this research area by proposing a method that characterises community evolution by analysing the time dependence of overlapping communities identified using a ‘clique percolation’ algorithm. The authors apply this method to two real world data sets, one related to mobile phone calls, and the other to the collaboration network of scientists. They find that larger communities persist longer if membership is dynamic, while smaller communities, or cliques, are the opposite; their persistence is dependent on static membership. This study is an important one as it addresses a rather obvious question related to the utility of social network analysis, viz. how communities evolve over time.
It was mentioned in the previous section that an analysis of the hierarchical structure of communities, as evident in the dendrograms produced by many community detection algorithms, may reveal interesting and as yet unexplained scaling laws. Guimera et al. (2003) explored this question by analyzing the hierarchical community structure produced by the GN algorithm applied to an email communication network from a university with around \(1,700\) employees, as well as two computergenerated networks for validity. Their main approach involved a quantitative characterisation of the binary tree using a method originally introduced for the study of bifurcating river networks, the HortonStahler (HS) index. This showed that only the email communication network had a near constant bifurcation ratio, which implies that a selfsimilar community structure was only evident in the real world network. Further to this, given that selfsimilarity is evident in a similar way in both community structure and river networks, there may be a common principle of optimisation, either of water flow in rivers or information within communities acting as the underlying driving force for the emergence of the observed structure. This work is deeply interesting as it proposes that an underlying principle of selforganisation across hierarchical levels in social networks may evolve to optimise diffusion throughout the network.
Following the open research question left by Guimera et al. (2003) in terms of additional applications and verification of the presence and function of selfsimilarity in social networks, Hamilton et al. (2007) published an analysis of the complex structure of huntergatherer networks. They used a rare compilation of ethnographic data containing \(1,189\) estimates of group size at multiple levels of organisation from a sample of 339 diverse huntergatherer societies. Similarly to Guimera et al. (2003), they utilised the HS index to construct bifurcation ratios between levels of aggregation. The results of this analysis show that the frequency of groups at each level of aggregation are statistically selfsimilar across the whole dataset. Furthermore, there was variation in the branching ratios by continent, suggesting that heterogeneous factors like geographical location have significant influence on the branching ratios. The main drawback from this publication is that there is potential bias in the data collection process that could have influenced the results. Nevertheless, the article has contributed an additional realworld application with interesting results to the existing body of research in this area.
Although there has been a lot of research produced around the analysis of hierarchical community structure within social networks, much of the work has proceeded along independent lines. For example, Song et al. (2005) produced a research article that was very similar to the investigation of Guimera et al. (2003), however it is clear from their short list of references to mostly seminal works, they did not incorporate much of the research already published. Nevertheless, their results demonstrate that four diverse, real world complex networks, including the social network of actor collaborations and the World Wide Web, consist of selfsimilar patterns on all scales. However, their methodology falls short in that they introduced a new method and did not validate the accuracy over a range of synthetically generated networks with known community structures and reference a benchmark model.
Due to their complex and interconnected nature, social networks are difficult to study with traditional analysis. Many mathematical methods with analytical solutions fail to account for complex, heterogeneous network structure. To address this, multiagent models are increasingly being applied to produce realistic social networks by simulating the actions of a large number of connected agents following specific sets of rules. The aim of social network models is to generate a network which represents a real world social network topologically. The first key examples of this approach were the models of Watts et al. (1998) and Barabasi et al. (1999), however a great deal of work has been published since. This work has primarily been carried out within the physics community, and many of the models are simple enough for analytical results to be derived regarding the key topological properties of the resulting network. Numerical simulations are also commonly utilised to confirm the analytical results, or derive results which cannot be determined analytically. However, the models published to date utilise use rule sets not empirically validated on social networks, and the full range of network topologies, including community structure, have not always been evaluated.
There are two main types of social network models. Growth models aim to generate a social network by adding nodes connected with some probability at each time step, and rewiring models generally start with a fixed number of nodes and edges, and proceed to rewire edges to generate the target structure. For both these model types, there are a common set of agent rules and mechanisms deployed, as well as target network properties that are taken to represent a social network.
The scalefree network model of Barabasi et al. (1999) has formed the basis of many growth models since. The model algorithm is defined as follows:
Initial Condition: Start with a network of \(m_{0}\) nodes and \(e_{0}\) edges
Growth: One node \(i\) with \(m\) edges is added at each time step
Preferential attachment: Each edge of \(i\) is then connected to an existing node with probability proportional to its degree, i.e. the probability for the new node \(i\) to be connected to an existing node \(j\) is given by:
\begin{equation} \Pi(k_{j})=\frac{k_{j}}{\sum_{i\in V}k_{i}},\\ \end{equation}where \(k_{i}\) and \(k_{j}\) represent the degree of nodes \(i\) and \(j\), respectively. \(V\) represents the set of nodes, or vertices, of the network.
The linear preferential attachment mechanism as defined above by Barabasi et al. (1999) leads to a scalefree network with a powerlaw degree distribution, given by \(P(k)\sim k^{\gamma}\). The exponent \(\gamma\) depends on the parameter \(m\), and also multiple nodes are sometimes added to the network at each time step.
Due to the the linear preferential attachment mechanism being known to produce scalefree networks, a volume of research has been published with variations of it. These variations are usually introduced to produce additional topological features, such as high clustering and community structure. Kim (2002) extend the above model by including a triad formation step, aimed at producing higher clustering coefficients. After the preferential step is run, if an edge was created between nodes \(i\) and \(j\), a new edge is added between \(i\) and a randomly selected neighbour of \(j\), thus forming a connected triad. If there are no unconnected neighbours of \(j\), the algorithm moves on to the next time step. The rationale behind this mechanism is a concept known as ‘sibling bias’, where an individual is more likely to know a friend of a friend. In this case, node \(i\) is more likely to be connected to a friend of its new friend, node \(j\). Guo et al. (2006) apply a similar concept, but replace the random attachment to a neighbour of \(j\) with a preferential attachment mechanism based on the degree of the neighbour. In this modification, the neighbour with highest degree is most likely to be selected.
Another variation is to define a ‘localworld’, or ‘neighbourhood’ within the network which limits the connection range of new nodes. For instance, Li et al. (2003) define a localworld evolving network model, where \(M\) nodes are selected randomly as the localworld for a newly added node. The effect of this modification is to truncate the powerlaw degree distribution to an exponential for large degrees, which is commonly observed. The linear preferential attachment mechanism is modified to only connect to the localworld, becoming:
\begin{equation} \Pi_{local}(k_{j})=\Pi^{\prime}\left(j\in local\right)\frac{k_{j}}{\sum_{i,local}k_{i}},\\ \end{equation}where \(\Pi^{\prime}\left(j\in local\right)=M/(m_{0}+t)\). Cao et al. (2006) adopt a similar approach, but with a more formal definition of a neighbourhood \(N_{p}(i)\) with depth \(p\),
\begin{equation} N_{p}(i)=\{j\mid d_{ij}\leq p;j\in V,j\neq i\},\\ \end{equation}where \(d_{ij}\) is the geodesic distance between nodes \(i\) and \(j\). The preferential attachment mechanism then becomes
\begin{equation} \Pi_{N_{p}}(k_{j})=\Pi\left(j\in N_{p}\right)\frac{k_{j}}{\sum_{i,local}k_{i}},\\ \end{equation}where \(\Pi(j\in N_{p})=size(N_{p})/(m_{0}+t)\). A further variation of this approach is from Guan et al. (2008), who introduce the concept of physical position as a node attribute. Each node is assigned a physical state vector, \(x_{i}\in\mathbb{R}\), and the distance between two nodes is defined as \(d_{ij}=\left{x_{i}x_{j}}\right\). The physical neighbourhood of node \(i\) with depth \(M\) is then given by
\begin{equation} N_{M}(i)=\{j\mid d_{ij}\leq d_{ik}\ ;\ \forall\ k\notin N_{M}(i),\ k\neq i,\ j\neq i,\ i,j,k\in V\}\\ \end{equation}All three of these models provide a tunable transition between powerlaw and exponential degree distributions. Li et al. (2003) show the degree distribution is modified by tuning the size of the neighbourhood between the two extremes of \(m\leq size(N_{p})\leq m_{0}+t\). When \(size(N_{p})=2m+1\) at the lower end, \(P(k)\sim\exp{k/m}\). When \(size(N_{p})=t+m_{0}\), i.e. the neighbourhood is the whole network, we get a powerlaw distribution, \(P(k)\sim 2m^{2}/k^{3}\). The results are similar for all three papers.
Several alternative connection mechanisms have also been proposed which give rise to scalefree networks. Saramaki et al. (2004) show that a network growth model where newly added nodes connect to the node which lies at the end of a random walk of length \(l\). However, it is noted that this process resembles linear preferential attachment, as nodes with high degree are more likely to in the random path. Tanaka et al. (2008) use a weight driven preferential attachment mechanism, where edge weights are established for new nodes, or incremented for existing nodes. Powerlaw distributions are observed for degree, strength, and weight distributions, with exponents which can be controlled by modifying the model parameters.
Toivonen et al. (2006) demonstrate that a scalefree network can be constructed by a two step process of first randomly selecting a node, then randomly selected a neighbour of the first node. Again, a random neighbour of a random node is still more likely to be selected if it has higher degree, as it will have more neighbours. Nevertheless, Toivonen et al. (2006) show that their mechanism produces several of the target topological properties for social networks, including high clustering, powerlaw distributed degree distribution, and community structure. The latter property is evaluated by using a \(k\)clique community detection method, which allows for overlapping communities. The authors also note that the community size distribution also follows a powerlaw. More recently, Yang et al. (2013) published a similar method which involves a mechanism connecting to random neightbours of arbitrarily chosen nodes. A scalefree network results from this model as well.
Social network models involving rewiring of edges have received less attention than growth models, and research has appeared later. With rewiring models, additional consideration must be given to the existing connection structure of the network. The exact rewiring schemes, however, tend largely to follow preferential attachment, similarly to growth models, although alternative models have also been proposed. While some research has been published on rewiring models alone, the outlook for social network models appears to be towards models incorporating both growth and rewiring.
Lindquist et al. (2009) proposed a network evolution model through different rewiring schemes. Starting with an undirected and unweighted network with a fixed number of nodes and edges, two rewiring schemes with two mechanisms are proposed, yielding four separate cases. Firstly, the rewired edge can be detached from either end nodes, either the selected node or the neighbouring node. The authors utilise both random connection preferences, and linear preferential attachment. Contrary to growth models utilising preferential attachment, they find that powerlaw degree distributions are only achieved for specific average degree values in the case with linear preferential attachment, and the neighbouring node is detached.
Johnson et al. (2009) published a rewiring mechanism of preferential detachment and reattachment, however, nonlinear preferential functions are used. In this study on a network with a fixed number of nodes and edges, a node \(i\) is first chosen with probability \(P(k_{i})\sim k_{i}^{\beta}\). An edge of the selected node \(i\) is randomly chosen and reconnected to node \(j\) with probability \(\Pi(k_{j})\sim k_{j}^{\alpha}\). The authors find that scalefree distributions emerge for a range of exponents only when \(\alpha=\beta\), i.e. the probability of gaining or losing edges is the same.
A more recent study from Guang et al. (2012) involves a weighted preferential rewiring mechanism. Starting from a connected network with a fixed number of nodes and edges, and constant edge weights, the weight strength for node \(i\) is defined as the sum of weights for all its edges, \(s_{i}=\sum_{j}w_{ij}\). In this rewiring model, at each time step a random edge is selected and one connected node is fixed. The edge is then reconnected to another node by node strength preferential attachment,
\begin{equation} \Pi_{i}=\frac{s_{i}}{\sum_{j}s_{j}}\\ \end{equation}The results of this study show that node strength is scalefree with exponent \(\gamma=1\), the edge weight distribution is exponential, and degree distribution is scalefree again with \(\gamma=1\).
The three studies of rewiring models previously reviewed focus strongly on generating powerlaw distributed social networks, and largely ignore many of the other topological properties. Furthermore, these models fail to account for network growth. The network growth models in turn ignore changes in the network once new nodes have been added.
By combining growth and rewiring into one model, a more accurate representation of a social network may be achieved. Fan et al. (2009) published a model which incorporates five separate mechanisms that modify the node and edge structure of the network. Of all the network models reviewed, it is worth enumerating the full rule set for this model as the authors utilise a variety of mechanisms which modify the topology of the network. These mechanisms have been classified in italics. Starting with \(m\) isolated localworlds \(\Omega_{i}\), \(i=1\ldots m\), each with \(m_{0}\) isolated nodes, at each time step one of the following steps is performed:
Community Growth: With probability \(p\), a new localworld is created containing \(m_{o}\) nodes and \(e_{0}\) links.
Network Growth: With probability \(q\), a new node is added to an existing localworld \(\Omega_{i}\) with \(m_{1}\) new links according to adjusted linear preferential attachment
\begin{equation} \label{FanPi} \label{FanPi}\Pi(k_{i})=\frac{k_{i}+\alpha}{\sum_{j\in\Omega_{i}}(k_{j}+\alpha)},\\ \end{equation}where the parameter \(\alpha>0\) is a constant which increases the likelihood of the new node \(i\) to attract new links.
Community Structure: With probability \(r\), \(m_{2}\) new links are added to an existing localworld by randomly selecting a localworld and a node in the localworld, then connecting a new link according to equation (\ref{FanPi}).
Network Detraction: With probability \(s\), \(m_{3}\) links are deleted. The links are selected by randomly selecting an \(\Omega_{i}\), then randomly choosing a node and deleting the link connecting to another node selected with probability
\begin{equation} \Pi^{\prime}(k_{i})=\frac{1}{N_{\Omega_{i}}(t)1}\left(1\Pi(k_{i})\right),\\ \end{equation}where \(N_{\Omega_{i}}(t)\) represents the number of nodes within \(\Omega_{i}\) at time step \(t\).
Community Bridges: With probability \(u\), \(m_{4}\) links are added between different local worlds by randomly choosing an \(\Omega_{i}\), then randomly selecting a node in \(\Omega_{i}\) with (\ref{FanPi}). We then choose a different \(\Omega_{j}\) and connect the first node to a node in \(\Omega_{j}\) chosen again with (\ref{FanPi}).
It is noted that \(0\leq p,q,r,s,u\leq 1\), and \(p+q+r+s+u=1\). The resulting complex network generated from this model has a degree distribution that is a mix of random, exponential and powerlaw, which the authors analytically derive this result using meanfield theory. The model is compared to real world data for the internet, and is shown to provide a closer degree distribution than the comparison models. However, no other topological features are evaluated, despite the formation of localworld communities being a primary motivation for this study. In addition, numerical simulations could provide some insight into additional measures, but are not conducted.
Judging from the research reviewed here, there is clearly a gap in the literature regarding multiagent models of social networks. To address this gap, a consensus is required regarding the topological features that a social network model needs to achieve. A testing framework to evaluate how well a model succeeds in generating these features is also required. Given that OSNs provide empirical data sets which can be used to evaluate a model against, future studies can utilise this data to instead aid in defining the agent rules. Developing a prediction model for link formation is one approach which can accomplish this aim.
Link prediction describes how the likelihood for a link existing between two nodes in a complex network can be estimated. Many approaches have been proposed, but most require nontopological, node specific information to achieve high accuracy. A key goal of link prediction is the development of an accurate model that can be applied universally to any social network dataset (Shibata et al., 2012).
There are many applications of link prediction. For instance, these methods can be applied for link recommendation to users in online social networks. Another application identified for link prediction models is the evaluation of evolving social network models (Lu et al., 2011). Link prediction algorithms estimate the influence of a set of features on the likelihood of link formation. A wide range of topological features can provide information regarding emergent properties of a social network through their predictive importance. This information may provide an empirical basis for the derivation of the rule set of an evolving social network mode. The research question of using a link prediction model to discover information about the deep structure of a social network is still open in the literature.
There are three common frameworks to link prediction: the similarity based approach, methods based on maximum likelihood estimation, and probabilistic modelling approaches.
The simplest framework for link prediction is the similaritybased approach. In this method, each pair of nodes \(i\) and \(j\) is assigned a score \(s_{ij}\), defined as the similarity between \(i\) and \(j\). All unobserved edges are then ranked according to their scores, with higher ranks having a greater link likelihood. Much of the early literature on link prediction focussed on the use of singular similarity features, or small sets of features. For example, Adamic et al. (2003) developed a similarity measure based on common neighbours to predict connections amongst web pages. LibenNowell et al. (2007) experimented with a wider range of similarity based measures, but still used each in isolation to rank node pairs with the highest scores. Many similarity indices have been proposed, see Lu et al. (2011)a for more details. As these methods are relatively simple, each similarity index only considers a limited amount of information regarding the graph topology. As such, the accuracy of these indices is generally quite low.
A more recent framework is to use algorithms based on maximum likelihood estimation. This approach assumes some topological form of the network, e.g. an exponential random graph (Zaccarin et al., 2010). An algorithm then estimates the model parameters against the dataset. Another method was proposed by Clauset et al. (2008) and starts by inferring the hierarchical organisation of a network. The hierarchical structure is assumed to extend across the network, and is applied to predict missing links. A modelling approach known as a stochastic block model has also been developed. These models partition nodes into groups, which strongly influence link probabilty. However, this method is known to ignore heterogeneity in node degree. Zhang et al. (2014) recently extended the stochastic block model by correcting for variable node degree, with improved results. Heterogeneity across other network properties may still be unaccounted for in this approach, limiting its performance in real world datasets. For these methods to work well in practice, the network structure should be first understood and matched to the closest topological form before a model is be built.
The third common framework for link prediction uses probabilistic modelling. This methodology aims to abstract the underlying link structure of the network through training a probabilistic model, commonly a supervised learning model. For instance, Backstrom et al. (2011) developed a supervised random walk approach. This method combines node attributes as a supervised learning problem to guide a random walk to nodes which are more likely to be connected. There are many types of probabilistic link prediction models which can be applied. Wang et al. (2011) tested a range of supervised learning models on a social network derived from mobile phone data. They found that a decisiontree model performed most effectively when trained on a mixture of both network and data specific node attribute measures.
Recent publications in this area have focussed on wider sets of features used in a supervised learning framework to predict link formation. More diverse topological features can capture different types of complex structure. Shibata et al. (2012) applied a support vector machine learning model to a variety of features on citation networks, including similarity scores, some centrality measures, community classification and node attributes. This approach achieved high performance, and the model weights for each feature were provided as importance measures. The nontopological node attribute features strongly influenced most of their models, and it was found that different models were required for different citation networks. Bliss et al. (2014) analysed link prediction on a large Twitter social network using a wide variety of topological and node attribute similarity features. An evolutionary algorithm was used to estimate the coefficients for a linear combination of features, with good results. Both of these publications, however, utilised nontopological attributes. This limits the network datasets to which their application can be applied to, and are difficult to compare across other networks. There is a great deal of interest in this area, so for a detailed review see Lu et al. (2011)a and, more recently, Wang et al. (2014).
One application identified for link prediction models is in the evaluation of evolving social network models (Lu et al., 2011). Link prediction algorithms effectively estimate the influence of factors on the likelihood of link formation, which is exactly what an evolving social network model requires as a rule set to generate a realistic model. The research question of using a link prediction model to generate a rule set for a multiagent model of a social network hasn’t yet been addressed conclusively in the literature. Further to this, link prediction models can also be used to estimate the likelihood that two nodes will share information, which in turn can be used to inform the rule set for a multiagent model of diffusion. This application may open a new branch of research in social networks.
The concept of diffusion on a social network is important due to its many practical applications. For instance, one could apply models of diffusion or propagation processes to social networks in order to better understand the dynamics of the spread of infectious diseases, the adoption of innovations and new products, or the spread of information. In recent years, the breadth of data now available regarding product transactions and social networks have opened opportunities to better understand diffusion, particularly from a marketing perspective (Luu et al., 2012). Despite this collection of data, it can still be very difficult to acquire information regarding the full topology of a social network due to privacy limitations. The research around diffusion modelling can be divided into two separate branches.
The first branch of research is referred to in this review as dynamical systems models, which relate to the modelling of interactions between groups of interest, typically by means of systems of differential equations. Examples of this include models of the spread of epidemics and the Bass model for the diffusion of innovations (Newman, 2003). These models do not require explicit knowledge of the network topology, but rather model the interactions between groups according to known diffusion parameters and statistical properties on a macro scale. Commonly, this approach will utilise nonlinear differential equations to capture the core dynamics. The other category of models are referred to as agent based diffusion models. This type of modelling investigates the behaviour of individual nodes, or agents, and requires much richer data regarding the network topology.
Motivated by the work of Barabasi et al. (1999) on defining the class of scalefree networks, PastorSatorras et al. (2001) observed that this network model not only favours efficient communication, but also the spread of viruses. Traditionally, the spread of viruses have been modelled by the standard susceptibleinfectedsusceptible (SIS) epidemiolgical model. In this model, each node exists in one of two states: ‘susceptible’ or ‘infected’. Each suceptible node is infected with probability \(\nu\) if it is connected to at least one infected node. Once infected, the node will recover with probability \(\delta\) and move back to the susceptible state. From these parameters, a spreading rate is defined by \(\lambda=\nu/\delta\). Commonly in epidemiological modelling, a critical threshold \(\lambda_{c}\) can be identified such that if \(\lambda\geq\lambda_{c}\), the virus will spread critically and persist throughout the network. Below the threshold, where \(\lambda<\lambda_{c}\), the epidemic dies out exponentially fast. Through analysing data on real computer virus epidemics across the internet, which has been found to be scalefree (Newman, 2003), the authors find the absence of an epidemic threshold for critical spreading. This result indicates that the diffusion dynamics of computer viruses on scalefree networks are independent of spreading rate \(\lambda\). However, the level of persistent infection throughout the network was found to be low.
One potential limitation of the previous work is that a power law degree distribtuion has been assumed across the network. Noting that this is not always the case in real world networks, Luu et al. (2012) define a multistage diffusion model that accounts for dynamic degree distributions. In particular, the authors develop separate diffusion models for networks with both power law and exponential degree distributions. To reiterate, a power law degree distribution is where the probability of a node having degree \(k\) is distributed as \(P(k)\sim k^{\gamma}\), where \(\gamma\) is a constant. In contrast, an exponential degree destribution is given by \(P(k)\sim e^{\frac{k}{\lambda}}\), where \(\lambda\) is a constant. The work of Luu et al. (2012) also differs from PastorSatorras et al. (2001) since rather than a SIS diffusion model, the authors apply the Bass diffusion model of innovation in which the number of new adopters is modelled as a function of the total number of adopters and an adoption rate. Luu et al. (2012) develop their multistage diffusion model as a combination of Bass diffusion models on both exponential and power law degree distributions. This model is tested on both synthetically generated network data and a real world dataset, and is found to be more accurate. However, an efficient way to estimate the degree distribution of nonadopters is still required.
Further to the previous paper, Kim et al. (2013) produced a study aimed at modelling diffusion dynamics across multiple online media types. The research question asked relates to the significance of the interaction of different social media types on the diffusion of news stories. Similarly to Luu et al. (2012), the authors adopt a probabilistic dynamical systems model due to the absence of reliable data regarding the network topology. Using a data set of \(386\) million web documents taken from one month in early \(2011\), they take a much smaller sample of \(4.1\) million documents connected via hyperlinks. They then extract key entities representing contextual news topics from these documents, and analyse the diffusion of the top \(63\) news topics across social media types using a proposed dynamic influence model. The authors found that indeed, contextual information spreads at different rates across different social platforms. However, there are a number of potential problems with this methodology. Firstly, the spread of information is defined by the explicit sharing or replication of a hyperlink, which only accounts for one type of digital information transfer. Secondly, the authors chose the top \(63\) news stories to be those that received the most attention, which is a biased sample and offers no information regarding what factors negatively affect diffusion.
Another concept that has gained popularity in social networks relates to the propagation and replication of ‘memes’. The concept of a ‘meme’ has its origins with Dawkins (1976), who defined them as units of mental content or cultural information that reproduce and evolve in a fashion similar to biological genes. With the introduction of online news, blogs and social networking sites, the empirical study of memes, or memetics, has received much attention in the social networks literature. For example, Leskovec et al. (2009) develop a framework for tracking the propagation and evolution of short phrases across online news and blog sites. The authors develop scalable algorithms to cluster textual variants of key phrases in quoted text across multipe articles, showing that this memetracking approach provides a coherent representation of the ‘news cycle’. With the set of memes automatically identified, threads of meme propogation are defined and a dynamical model is constructed to reproduce much of the variation. As a set of minimal requirements, it is found that a process of imitation, where threads that gain significant volume are more likely to persist through growth from other adopters, and favourable weighting for recency are sufficient to reproduce the observed dynamics.
A further application of interaction models to social networks can be found in the literature pertaining to the popularity of online content. Popularity involves competition between alternative items of content, and as such some interesting dynamics can result. For example, Ratkiewicz et al. (2010) provide a large scale analysis of the temporal dynamics of online content from two key data sources. The first is the full set of Wikipedia articles and their associated web traffic from \(2001\) to \(2007\), and the second is a yearly sequence of crawls on the Chilean Web. It is found that the popularity dynamics are characterised by bursts, and the popularity of articles or web pages follow a power law distribution. These characteristics are not readily reproduced by standard network models, however by introducing a reranking probability \(\rho\) that randomly moves any given node to the highest rank, and combining with preferential attachment, the activity bursts and fat tailed popularity distribution are reproduced. Another approach is given by Gleeson et al. (2014), who find that empirically observed power law meme popularity distributions can be reproduced by a dynamical model where memes compete for the limited resource of attention. The authors dub this mechanism ‘Competition Induced Criticality’ since the resulting dynamics resemble those of ‘SelfOrganised Criticality’ models applied to critical phenomena such as avalanches. This concept is intriguing as it implies that the critical spreading behaviour of online content may be akin to other critical phenomena, which exhibit interesting properties including scaling laws and fractality (Schroeder, 1991).
In the previous section, we discussed approaches to modelling diffusion and spreading behaviours on networks where detailed information regarding the topology is not availble, so instead are approximated statistically. However, given richer data sets regarding individual nodes and edges in the network, we can apply multiagent diffusion models. These models distinguish how individual behaviour relates to network structure on a granular level, and in many cases interesting global properties can emerge from the local dynamics.
A good introduction to the topic of multiagent diffusion models is provided by Rahmandad et al. (2008). Due largely to the increase in computational power, many researchers are turning to agent based models to simulate diffusion processes. However, there is little concensus as to whether such models will provide better results than dynamical systems models as each have their strengths and weaknesses. For example, nonlinear differential equations can effectively incorporate a wide range of feedback mechanisms, but are limited to a small number of states that can be modelled. Furthermore, within each state nodes are assumed to be homogeneous and well mixed according to a standard distribution. Alternatively, multiagent models can also include many types of feedback, but can be applied across heterogeneity in a number of network properties and individual attributes. The cost of this advantage is greatly increased computational complexity and a detailed knowledge of network structure and individual propensities. Rahmandad et al. (2008) find that out of the differences between model classes, potentially the greatest is that many interaction models provide a single forecast trajectory, while multiagent models can generate a distribution of outcomes. The authors test the impact of individual heterogeneity and different network topologies on both classes of models in an epidemic context, and find that network structure significantly affects the diffusion dynamics.
Moving on to specific applications of agent models on real world data, Prakash et al. (2012) analyse threshold conditions for a virus to spread critically across a static network. However, in this case the full network topology is analysed to prove a general theorem which decouples the effects of the network topology and the virus propagation model. Prakash et al. (2012) also show that the threshold spreading rate depends only on the first eigenvalue of the connectivity matrix for any graph, and also for all standard propagation models. This theorem is proved both theoretically and also on real world data sets where the full adjacency matrix is calculated. This result has practical significance in controlling the outbreak of epidemics, as targeting a small number of influential nodes can dramatically alter the virus spreading widely. In this application, methods to identify influencer nodes such as eigenvector centrality, as discussed in Section \ref{influencers}, align well with the theorem. This result is significant, however it has only been proven on single virus propagation models over static networks. The theorem may not necessarily generalise to heterogeneity in either the diffusion process or the network structure.
Further to the comments of Rahmandad et al. (2008) regarding modelling large numbers of states, Weng et al. (2012) ask the question of why some memes go viral while others do not, and employ a multiagent approach to reproduce observed diffusion dynamics. Using data from Twitter, Weng et al. (2012) identify a number of empirical regularities, including longtailed distributions for meme lifetime, user activity and meme popularity. This points to very hetergeneous user sharing behaviours. In their diffusion model, a frozen network of agents is assumed where each agent maintains a time ordered list of posts about individual memes. With equal probability, the agents can generate a new post or share an existing post in their list, transmitting either to their neighbours. Limited attention is incorporated by restricting posts to sit in an agent’s list for finite time \(t_{w}\), and competition amongst memes is achieved by tuning \(t_{w}\) below \(1\) for increased competition or greater than \(1\) for more attention and less competition. After simulations of this model, Weng et al. (2012) find that in order to reproduce many characteristics observed empirically, including the longtailed distributions of meme popularity and lifetime, the combination of social network structure and competition amongst memes are sufficient conditions.
Although the work of Rahmandad et al. (2008) included heterogeneity at the individual node level and different network topologies, the effect of community structure was not considered. In previous research, the viral propagation of memes is generally assumed to spread as an infectious disease, or simple contagion. In contrast, Weng et al. (2013) cite evidence that the spreading behaviour of memes is more akin to a complex contagion, where the effects of social reinforcement and homophily reinforce the spreading. In other words, diffusion within highly clustered communities which exhibit homogeneity across significant attributes, such as topical interests, is enhanced. This phenomena is referred to as social reinforcement, and when combined with structural trapping, i.e. where crosscommunity diffusion is dampened, the global diffusion dynamics can be dramatically altered. To test this hypothesis, Weng et al. (2013) develop a model to predict whether or not a meme spreads virally using data from two real world social networks and synthetic data sets. They find that the likelihood of a meme spreading virally is negatively correlated with meme concentration. The rationale behind this result is that if a meme spreads only to a highly concentrated set of nodes, it has likely saturated a community, but received little attention elsewhere in the network. In contrast, a meme with low concentration has likely spread further throughout the network and received attention from a wider range of communities.
The proposed research aims to model the emergence of viral content and memes in social networks. There are many methods, techniques and frameworks which are required to achieve this research aim. Firstly, the adopted modelling approaches are categorised within the fields of statistical machine learning and multiagent systems. For the knowledge contributions around link prediction and the modelling of OSN user content engagement propensities, we are adopting methods from statistical supervised learning. Unsupervised learning methods will also be used to better understand OSN users based on their modelled engagement propensities. Furthermore, to handle data from real world OSNs, we will need to apply natural language processing to extract meaning from user comments and free text.
Our approach to modelling and simulating the dynamics of diffusion within a social network utilise multiagent systems. The rule sets for these systems are derived through machine learning techniques. The networks over which these models are applied will be a combination of artificially generated graphs, and real world online social networks. To understand and classify the emergence of viral content and memes, we will use a variety of analysis, compare measures such as the velocity and extend of the diffusion to real world examples.
Statistical machine learning forms a large part of the modern field of machine or computer learning. For the purposes of this research, we utilise three key areas of machine learning. These are supervised learning methods, used to create prediction models for specific outcomes or targets. Unsupervised learning will be used to identify natural groupings in data. The third method is natural language processing, which is crucial to working with data from online social networks as much of the data is unstructured free text.
Supervised learning describes the machine learning task of inferring a function which maps a set of inputs to a target output, or dependent variable, for a set of training data. The dependent variable or vector can be a binary flag, a categorical variable, or a numeric vector. There are a wide range of methods available for supervised learning, and the applicability of each depends on the type of dependent variable. A varying set of evaluation methods are also utilised, including measure the predictive accuracy, and applying the model over varying partitions and samples of data. This evaluation is used to determine broadly how accurate the model is, how robust it is to variations in data, and the importance of each of the features.
In this research, we utilise several supervised learning techniques. The first is the random forest model (Breiman, 2001), which we use for link prediction with graph topological measures. The dependent variable for this model is a binary flag for whether a link between two nodes exists or not. This technique effectively creates a large number of decision trees over different samples of the data, then combines all the individual results into one model. This is advantageous for the link prediction problem as more compex methods for data sampling are not required. The model is also very accurate and can be trained to effectively distinguish between unbalanced classes.
For our extension of the link prediction problem to include link sign and nature, as well as nontopological features, a range of supervised learning techniques will be applied and tested. The key requirements for these models are that they have high accuracy over unbalanced classes, provide feature importance scores, and have a reasonably low computational cost. This last requirement is generally satisfied for many regression techniques, such as the LASSO. Models such as the random forest and support vector machine can have high computational cost, but can also deliver higher accuracy then regression methods.
Finally, a key research component to this thesis is to create a model to describe and predict different types of OSN users levels of engagement with heterogeneous content. For this application, we propose using a Bayesian belief network. This type of model is suited to this task as we are likely to have a high dimensional data set. Bayesian techniques can handle this sort of data well. However, other models will also be applied to ensure that the most suited technique is used.
Unsupervised learning is a type of machine learning task that involves drawing inferences from sets of data without a specified target variable. Unsupervised learning methods are commonly used in exploratory analysis to find hidden patterns or grouping in data. The process for applying an unsupervised learning technique is simpler than the supervised case as there is no consideration of a target response. However, performance evaluation is still important. This usually take the form of measures which quantity the amount of variation in the data captured by the groupings.
There are many different methods available for unsupervised learning. Perhaps the most common are cluster analysis techniques, such as kmeans or hierarchical clustering. These methods are generally based on iterative algorithms which seek to reallocate observations to a set of clusters so that an evaluation function is minimised. In this research, we propose to use unsupervised learning to find natural groupings amongst OSN users with regard to their content preferences. These content clusters can be used to define preference communities within a social network, which form an input to the multiagent modelling of content diffusion.
Natural language processing (NLP) is concerned with the interaction between human language and computers. NLP consists of a set of techniques to translate natural language into data structures that computers can interpret. There are several key categories of NLP, which range over a variety of use cases. For this research, we will be processing unstructured free text data from online social network platforms. The key types of NLP we will apply belong to the categories of sentiment analysis, named entity recognition, topic recognition and natural language understanding.
Sentiment analysis refers to techniques that aim to understand the sentiment of a piece of text with respect to a topic or named entity. Named entity recognition determines which items in a text to map to names, and what those names relate to. Topic recognition similariy attempts to identify topics that are referenced, and segment the text into pieces that deal with each topic. Natural language understanding is perhaps the most complex of these areas, and attempts to identify the intended semantic, or meaning from the text, from the multiple possible semantics.
The proposed research aims to use features derived from the topology of a network in many different ways. Topological feature sets are used for the link prediction problem, as well as in designing multiagent systems and the analysis of viral diffusion and meme propagation. Novel approaches to the extraction of feature sets in graphs are developed in this thesis.
The literature and body of theory related to graphs defines numerous measures, algorithms and techniques to extract information regarding the network topology. In this thesis, we utilise existing research to formulate a wide range of graph features to be used for modelling. The features used relate to network centrality and measures of influence, community structure and classification, and spectral analysis and projections. We extend the stateoftheart by further evaluating the influence of each feature set on the dynamics of diffusion.
Multiagent systems are computerised models of the interactions between multiple agents and their environment. These models have particularly high utility in social networks, since a social network is a network of interacting agents. Generally, the process of developing a multiagent system starts with a definition or derivation of the agent rules, which govern the interactions between agents. The system is then run in simulation and the results are analysed.
Multiagent systems are particularly useful to model the evolution of social networks, and diffusion within the network. Through simulating the interactions between agents, we can explore global emergent properties in a network.
Multiagent models have been used extensively in social network analysis due to their ability to incorporate complex graph structure and agent behaviour. Their use goes back to the smallworld and scalefree network models of Watts et al. (1998) and Barabasi et al. (1999), respectively. Multiagent models have since been applied extensively to create artificial networks with real world properties.
A key challenge with using multiagent models to simulate a social network’s evolution is in defining the rule set. This rule set can define growing mechanisms for how new nodes connect to the existing network, and rewiring or removal of existing connections. Following from the smallworld and scalefree models, most research in this area has focused on tuning the scalefree model (Kim, 2002; Guo et al., 2006), or producing scalefree networks with other methods such as random walks and alternative growth mechanisms (Saramaki et al., 2004; Yang et al., 2013).
We propose extending the existing literature by firstly providing a review of social network multiagent models, which is currently lacking. Secondly, we develop a multiagent model for specific online social networks using link prediction models to derive the agent rule set empirically.
Much of the early research around diffusion in social networks utilised dynamical systems models. However, most of these models do not take into account the exact topology of the network. Network effects clearly can have a dramatic influence on diffusion (Rahmandad et al., 2008). To address this, multiagent models have been proposed where agents interact across the network topology.
Again, the key challenge with multiagent diffusion models is in defining the rule set. Generally, this rule set is more complex than an evolving social network model. Previous research has shown that agent rules can incorporate limited attention (Weng et al., 2012) and the influence of neighbours in a complex contagion (Barash et al., 2012). We aim to extend this research by creating a multiagent diffusion model which incorporates more topological features, and agent rules for content preference. These rule sets will be derived from real world data using both link prediction and content engagement likelihood models.
The data applications for this research represents a novel contribution to the existing literature. The methodology developed will be applied to artifically generated networks, and also real world data from Twitter. R statistical software is used for all statistical work, and we make extensive use of the iGraph package (Csardi et al., 2006). Multiagent systems will be programmed in combination of R and Python. Methods to identify the spread of memes and viral content in Twitter and Facebook are also developed.
There are three key types of artificial networks used in this thesis; the Erdős and Rényi random graph (Erdős et al., 1957), the smallworld model of Watts et al. (1998) and the scalefree model of Barabasi et al. (1999). All are available within the iGraph package in R.
Data regarding users and activity from the social media platforms Twitter and Facebook will be extracted over the period 1 November 2014 to 31 May 2015. Details regarding volume and available data will be provided once data has been extracted. This data is provided by social media analytics company Digivizer.
Year  Activities  Output 

2014 


2015 


2016 


2017 



This research is proposed to be completed and delivered on the \(31^{st}\) of December, \(2017\), according to the timeline outlined in Table \ref{timeline}.
Although this research relies heavily on data processing algorithms and technology, much of this already exists and is available at UTS, or will be created during the research.
The largest portion of the work required for this research will be in writing and implementing software applied to large data sets. This is to be done primarily with the R software language and Python programming language. R and Python are both open source so there is no additional software cost for this project.
It is anticipated that many of the real world OSN data sets and simulated networks used in this research will be very large. In addition, many of the techniques which will be applied to these data sets have a heavy computational cost. Access to the UTS High Performance Computing Linux Cluster is therefore required to carry out this research.
In agreement with the research timeline presented in Table \ref{timeline}, the following tasks have been completed.
The Technology Research Preparation course, subject number \(32144\), was completed in Semester \(1\), \(2014\) with a mark of \(94\) and grade of High Distinction. An output of this course is a draft literature review for the thesis. In preparation for the candidature assessment, this draft has been extended to include the areas of social network models and link prediction. The literature review will need to be constantly updated until this dissertation is finished, as the research area is very active and new papers are frequently being published. Nevertheless, the literature review to date has identified gaps in the research and has informed the proposed research questions. Furthermore, there is a gap regarding published reviews of the literature for social network models. We aim to address this gap with a review publication, largely based on the literature review which has been completed.
The Technology Research Methods course, subject number \(32931\), was completed in Semester \(2\), \(2014\) with a mark of \(83\) and grade of Distinction. The output of this course was an early draft for the candidature assessment document. This has been modified and extended to cover additional work completed since this course.
A theoretical framework and R software implementation has been developed for the first knowledge contribution regarding link prediction using topological features. A paper entitled ‘Link Prediction and Topological Feature Importance in Social Networks’ was submitted for review to the 2015 Australasian Data Mining Conference, and has been accepted. In this paper, a novel approach to link prediction using topological graph features with a random forest learning model was developed. Random forest variable importance scoring was also used to evaluate properties of each network. The approach was applied to three real world data sets, the citation networks Cora, Citeseer and WebKB, as well as three artificially generated networks. This paper forms the basis for the first thesis chapter on link prediction methods.
Work is currently underway to publish an R package of this implementation, and extend the theoretical framework to include additional link characteristics.
Erdős, Rényi. On random graphs. Publicationes Mathematicae 6, 290–297 (1957).
Jeffrey Travers, Stanley Milgram. An experimental study of the small world problem. Sociometry 32, 425–423 (1967).
Duncan J. Watts, Steven H. Strogatz. Collective dynamics of ’smallworld’ networks. Nature 393, 440–442 (1998).
A. Barabasi, R. Albert. Emergence of scaling in random networks. Science 286, 509–512 (1999).
M. Girvan, M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 7821–7826 (2002).
Richard Dawkins. The Selfish Gene. Oxford University Press, 1976.
Lilian Weng, Filippo Menczer, YongYeol Ahn. Predicting successful memes using network and community structure. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media. (2014).
M. E. J. Newman. The structure and function of complex networks. SIAM Review 45, 167–256 (2003).
L. C. Freeman. The sage handbook of social network analysis. 26–39 Sage Publications Ltd, 2011.
Sen Pei, Hernan A. Makse. Spreading dynamics in complex networks. Journal of Statistical Mechanics: Theory and Experiment 12, 1–21 (2013).
M. E. J. Newman. A measure of betweenness centrality based on random walks. Social Networks 27, 39–54 (2005).
Ulrik Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 163–177 (2001).
Ulrik Brandes. On variants of shortestpath betweenness centrality and their generic computation. Social Networks 30, 136–145 (2008).
Amin Mantrach, Luh Yen, Jerome Callut, Kevin Francoisse, Masashi Shimbo, Marco Saerens. The sumoverpaths covariance kernel: a novel covariance measure between nodes of a directed graph. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1112–1126 (2010).
Linyuan Lu, YiCheng Zhang, Chi Ho Yeung, Tao Zhou. Leaders in social networks, the delicious case. PLoS ONE 6, 1–9 (2011).
Jianshu Weng, EePeng Lim, Jing Jiang, Qi He. TwitterRank: finding topicsensitive influential Twitterers. ACM International Conference on Web Search and Data Mining 261–270 In ACM International Conference on Web Search and Data Mining. (2010).
Florian Probst, Laura Grosswiele, Regina Pfleger. Who will lead and who will follow: identifying influential users in online social networks. Business and Information Systems Engineering 3, 179–193 (2013).
Pavlos Basaras, Dimitrios Katsaros, Leandros Tassiulas. Detecting influential spreaders in complex, dynamic networks. Computer 46, 24–49 (2013).
Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H. Eugene Stanley, Hernan A. Makse. Identification of influential spreaders in complex networks. Nature Physics 6, 888–893 (2010).
S. Fortunato. Community detection in graphs. Physics Reports 486, 75–174 (2010).
A. Clauset, M. E. J. Newman, C. Moore. Finding community structure in very large networks. Physical Review E 70, 1–6 (2004).
M. E. J. Newman. Spectral methods for community detection and graph partitioning. Physical Review E 88, 1–10 (2013).
G. Pallo, A. L. Barabasi, T. Vicsek. Quantifying social group evolution. Nature 446, 664–667 (2007).
R. Guimera, L. Danon, A. DıazGuilera, F. Giralt, A. Arenas. Selfsimilar community structure in a network of human interactions. Physical Review E 68, 1–4 (2003).
Marcus J. Hamilton, Bruce T. Milne, Robert S. Walker, Oskar Burger, James H. Brown. The complex structure of huntergatherer social networks. Proceedings B 274, 2195–2202 (2007).
Chaoming Song, Shlomo Havlin, Hernan A. Makse. Selfsimilarity of complex networks. Nature 433, 392–395 (2005).
Petter Holme AND Beom Jun Kim. Growing scalefree networks with tunable clustering. Physical Review E 65, 1–4 (2002).
Qiang Guo, Tao Zhou, JianGuo Liu, WenJie Bai, BingHong Wang, Ming Zhao. Growing scalefree smallworld networks with tunable assortative coefficient. Physica A 371, 814–822 (2006).
Xiang Li, Guanrong Chen. A localworld evolving network model. Physica A 328, 274–286 (2003).
Y. J. Cao, G. Z. Wang, Q. Y. Jiang, Z. X. Han. A neighbourhood evolving network model. Physics Letters A 349, 462–466 (2006).
ZhiHong Guan, ZhengPing Wu. The physical position neighbourhood evolving network model. Physica A 387, 314–322 (2008).
Jari Saramaki, Kimmo Kaski. Scalefree networks generated by random walkers. Physica A 341, 80–86 (2004).
Takuma Tanaka, Toshio Aoyagi. Weighted scalefree networks with variable powerlaw exponents. Physica D 237, 898–907 (2008).
Riitta Toivonen, JukkaPekka Onnela, Jari Saramaki, Jorkki Hyvonen, Kimmo Kaski. A model for social networks. Physica A 371, 851–860 (2006).
XuHua Yang, ShunLi Lou, Guang Chen, ShengYong Chen, Wei Huang. Scalefree networks via attaching to random neighbours. Physica A 392, 3531–3536 (2013).
Jennifer Lindquist, Junling Ma, P. van den Driessche, Frederick H. Willeboordse. Network evolution by different rewiring schemes. Physica D 238, 370–378 (2009).
Samuel Johnson, Joaquin J. Torres, Joaquin Marro. Nonlinear preferential rewiring in fixedsize networks as a diffusion process. Physical Review E 79, 1–3 (2009).
Chen Guang, Yang XuHua, Xu XinLi. Weighted scaling in nongrowth random networks. Communications in Theoretical Physics 58, 456–462 (2012).
Zhengping Fan, Guanrong Chen, Yunong Zhang. A comprehensive multilocalworld model for complex networks. Physics Letters A 373, 1601–1605 (2009).
Naoki Shibata, Yuya Kajikawa, Ichiro Sakata. Link prediction in citation networks. Journal of the American Society for Information Science and Technology 63, 78–85 (2012).
Linyuan Lu, Tao Zhou. Link prediction in complex networks: a survey. Physica A 390, 1150–1170 (2011).
Lada A. Adamic, Eyton Adar. Friends and neighbors on the web. Social Networks 25, 211–230 (2003).
David LibenNowell, Jon Kleinberg. The linkprediction problem for social networks.. Journal of the American Society for Information Science and Technology 58, 1019–1031 (2007).
Susanna Zaccarin, Giulia Rivellini. Modelling Network Data: An Introduction to Exponential Random Graph Models. 297–305 In Data Analysis and Classification. Springer Berlin Heidelberg, 2010.
Aaron Clauset, Christopher Moore, M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008).
Xue Zhang, Xiaojie Wang, Chengli Zhao, Dongyuan Yi, Zheng Xie. Degreecorrected stochastic block models and reliability in social networks. Physica A 393, 553–559 (2014).
Lars Backstrom, Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. 635–644 In WSDM Conference. (2011).
Dashun Wang, Dino Pedreschi, Chaoming Song, Fosca Giannotti, AlbertLaszlo Barabási. Human Mobility, Social Ties, and Link Prediction. 1100–1108 In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011. Link
C.A. Bliss, M.R. Frank, C.M. Danforth, P.S. Dodds. An evolutionary algorithm approach to link prediction in dynamic social networks. Journal of Computational Science 5, 750–764 (2014).
Peng Wang, BaoWen Xu, YuRong Wu, XiaoYu Zhou. Link prediction in social networks: the stateoftheart. Science China 58 (2014).
MinhDuc Luu, EePeng Lim, TuanAnh Hoang, Freddy Chong Tat Chua. Modeling diffusion in social networks using network properties. 218–225 In Sixth International AAAI Conference on Weblogs and Social Media. (2012).
Romualdo PastorSatorras, Alessandro Vespignani. Epidemic spreading in scalefree networks. Physical Review Letters 86, 3200–3203 (2001).
Minkyoung Kim, David Newth, Peter Christen. Modeling dynamics of diffusion across heterogeneous social networks: news diffusion in social media. Entropy 15, 4215–4242 (2013).
Jure Leskovec, Lars Backstrom, Jon Kleinberg. Memetracking and the dynamics of the news cycle. 497–505 In 15th ACM SIGKDD international conference on Knowledge discovery and data mining. (2009).
Jacob Ratkiewicz, Santo Fortunato, Alessandro Flammini, Filippo Menczer, Alessandro Vespignani. Characterizing and modeling the dynamics of online popularity. Physical Review Letters 105, 1–4 (2010).
James P. Gleeson, Jonathan A. Ward, Kevin P. O’Sullivan, William T. Lee. Competitioninduced criticality in a model of meme popularity. Physical Review Letters 112, 1–5 (2014).
Manfred Schroeder. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman and Company, 1991.
Hazhir Rahmandad, John Sterman. Heterogeneity and network structure in the dynamics of diffusion: comparing agentbased and differential equation models. Management Science 54, 998–1014 (2008).
Aditya Prakash, Deepayan Chakrabarti, Nicholas Valler, Michalis Faloutsos, Christos Faloutsos. Threshold conditions for arbitrary cascade models on arbitrary networks. Knowledge Information Systems 33, 549–575 (2012).
L. Weng, A. Flammini, A. Vespignani, F. Menczer. Competition among memes in a world with limited attention. Scientific Reports 2, 1–7 (2012).
Lilian Weng, Filippo Menczer, YongYeol Ahn. Virality prediction and community structure in social networks. Scientific Reports 3, 1–6 (2013).
Leo Breiman. Random forests. Machine Learning 45, 5–32 (2001).
Vladimir Barash, Christopher Cameron, Michael Macy. Critical phenomena in complex contagions. Social Networks 34, 451–461 (2012).
Gabor Csardi, Tamas Nepusz. The igraph software package for complex network research. InterJournal Complex Systems, 1695 (2006). Link