Visual Network Analysis
In the last few years, a spectre has been haunting our academic and popular culture — the spectre of networks. Throughout social as well as natural sciences, more and more phenomena have come to be conceived as networks. Telecommunication networks, neural networks, social networks, epigenetic networks, ecological networks, value networks, the very fabric of our existence seems to be made of lines and points. More recently, the interest for graphs overflowed to popular culture and networks started to appear in art, graphics, advertizing, even furniture ADD OTHER EXEMPLES.
Our growing fascination for networks is not unjustified. Networks are powerful conceptual tools, encapsulating in a single object multiple affordances for the computation (networks as graphs), visualization (networks as maps) and manipulation of data (networks as interfaces).
In the first place and to a large extent, the success of networks is to be credited to the amazing versatility of graph mathematics. From railways to information routing, from financial to communications flows, from ecosystems to organization management graphs have found countless applications. Graph computational formalism proved so effective that we started seeing networks everywhere and transforming everything into systems of discrete but interconnected items. It would be unfair, however, to reduce networks to their mathematical properties. Graph theory has been around in mathematics since Euler’s walk on Königsberg’s bridges1, but it is not until the end of the last century that networks acquired a multidisciplinary popularity. Graph computation is certainly powerful, but it is also very demanding and for many years its advantages remained the privilege of scholars with solid mathematical bases.
In the last few decades, however, networks acquired a new set of affordances and reached a larger audience, thanks to the growing availability of tools to design them. Drawn on paper or screen, networks become easier to handle and obtain properties that calculation cannot express. Far from being merely aesthetic, the graphical representation of networks has an intrinsic hermeneutic value. Networks become maps and can be read as such.
Finally, the encounter with personal computing has recently turned networks into tools for data manipulation. Not only network-like visualizations are employed in a growing number of digital interfaces, but more and more specialized software has been designed to support the exploration of network data. Tools like Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek), Ucinet (www.analytictech.com/ucinet), Guess (graphexploration.cond.org) and more recently Gephi (gephi.org) have progressively smooth out the difficulties of graph mathematics, turning a complex mathematical formalism in a simple point-and-click interface2.
Combining the computation power of graphs with the visual expressivity of maps and the interactivity of computer interface, networks accomplish the dream of the Exploratory Data Analysis (Tukey, 1977): a navigation through data so fluid that zooming in a single data-point and out to a landscape of a million traces are just a click away3. No wonder that networks are popular!
The expansion of network from graphs to maps and interfaces has been impetuous and reached distant regions of science and society. Yet the visualization of networks has so far lacked of reflexivity and formalization. We designed and read networks as if their visual grammar was obvious, but the more we advance, the more we realize that this is not the case. We painfully lack the conceptual tools to think about the projection of graphs in the space. The very vocabulary we use has been borrowed from mathematics (e.g. cluster, structural equivalence…) and geography (e.g. centrality, bridging…) and need to be adapted to the new visual paradigm. This paper means to contribute to such reflection and propose a tentative framework for the visual analysis of networks.
Solutio problematis ad geometriam situs pertinentis, 1736.↩
A simple look at the URLs of the subsequent tools reveals the efforts deployed to make network-manipulation tools user-friendly and thereby available to a larger public↩
By offering a tool for datascape navigation, networks are also fulfilling the dream Gabriel Tarde, a forgotten father of social thinking, who imagined that the development of statistics would have one day allowed to overcome the distinction between qualitative and quantitative methods and between micro and macro sociology.↩
Before we move to the enunciation of the visual grammar of networks, however, we would like to briefly discuss the reasons that have delayed so far this type of reflection. These reasons date back to the very foundation of graph mathematics. In solving to the problem of Königsberg’s bridges, Euler performed the most classical of mathematics operations. He abstracted the formal structure of the problem from its empirical features: he took a city and turned it into a table of number (see figure 1). In doing so, Euler laid the foundation of discrete mathematics at the cost of separating the idea of network from its physical materializations. His operation has been so successful that, for the following two centuries and a half, the reflection on networks was dominated by their structural properties, with little interest for practical applications. One of the consequences of such focus on structures (at the expenses of the actual contents of networks) has been that mathematicians never saw the interest of representing networks. For them, design a network was (and still is) perfectly useless.
The idea that it could be worth to draw a network to see what it looked like came from a different tradition: the tradition of social networks analysis. Jacob Moreno, founder of this approach, was very explicit about the importance of visualization: “A process of charting has been devised by the sociometrists, the sociogram, which is more than merely a method of presentation. It is first of all a method of exploration. It makes possible the exploration of sociometric facts” (1953, pp. 95-96). Since Moreno and his followers were working on real networks constructed by the observation of actual social relations, it made sense to them to design networks. Identifying visual patterns became the equivalent of looking for social dynamics. In a seminal paper published on the New York Time in 1939, Moreno refers to network analysis as “a new geography” (see fig. 2). By drawing his sociograms, Moreno reconnected networks with their most ancient ancestor: the geometric figure, a set of points and line whose properties are meant to be explored by its visual representation. Graphs returned to be graphic.
This of course, was only the beginning of the reflection on the visualization of networks. Once you decide to draw a network as a set of points and lines, you still have to decide which colors to use, which stroke, which style and, most crucially, which composition rules to follow. None of these questions is trivial and the story of how early analysts of social networks wrestled with their design is long and interesting. Unfortunately, we do not have here the space to tell such story (but see Freeman, 2010 for a remarkable account of it). For the paper, it will be enough to remark that, though crucial for the founders of social network analysis, the reflection on network design progressively lost its interest for their followers. Understandably fascinated by the parallel developments of graph mathematics, later social networks’ analysts focused on statistics and progressively neglected networks design. In this paper, we draw on the reflection of Moreno and his early followers to discuss how the visualization of networks can be exploited for the study of social phenomena.
The next paragraph will introduce the three main visual variables (ADD REFERENCE TO BERTIN) mobilized by visual network analysis: the position of nodes, their size and their color. We will briefly discuss the features of each of these variables and we will describe how they are employed to represent different characteristics of networks. After having set the theoretical basis of our approach, we will then propose an example of analysis in order to provide a practical guideline for visual network analysis.
Of these variables, the first one is by far the most important. Like geographical maps, graphs are two-dimensional1 representations, but unlike maps they cannot rely on a predefined set of projection rules. In a geographical representation, the space is defined a priori by the way the horizontal and vertical axes are constructed. Points are projected on such pre-existing space according to a set of rules that assign them a pair of coordinates and thereby a univocal position. The same is true for any Cartesian coordinate system, but not for network graphs2.
Nothing in network data predetermines where nodes have to be located in the graph. This has to do with the essentially discrete nature of graphs. Unlike geographical maps, graphs do not represent a continuous phenomenon (such as the distance between two landmarks), but a discrete one: two nodes are either connected or not3. Therefore, as long as the edges are correctly drawn and link nodes that are connected in the dataset, nodes can assume whatever position without affecting the way the graph is read4.
Of course, this is true in theory, but not in the practice of drawing graphs. As soon as Moreno and his followers started handling large graphs, they discovered that some ways of positioning nodes could make their sociogram easier to read. Notably, Moreno himself enunciated the main rule still in use today to draw clearer graphs: “the fewer the number of lines crossing, the better the sociogram” (1953, p.141).
Easy to follow when working on graphs of a few dozens of nodes and edges, Moreno’s precept becomes impossible to implement directly on larger networks. Graphs with hundreds of nodes and edges have thousands of lines crossing: how to even know if moving a node increase or decrease the crossings? Since direct implementation of Moreno’s precept is impossible, one can try an indirect approach: drawing closer the points that are connected minimizes the length of the lines and therefore the possibility of crossing. But, even so, since each node is normally connected to several other nodes that are themselves connected to other nodes, minimizing the length of connection is far from being trivial.
The solution to this problems is so complex and computational demanding, that it could not be found until network visualization migrated from paper to pixels. The solution is a spatialization technique that came to be known as “force-vector algorithms”. A force-vector algorithm works following a physical analogy: nodes are given a repulsive force that drives apart, while edges work as springs bounding the nodes that they connect. Once the algorithm is launched it changes the disposition of nodes until reaching the equilibrium that guarantees the best balance of forces. Such equilibrium minimizes the number of lines crossings and thereby maximizes the legibility of the graph5.
There is, however, a most interesting by-product of such visualization techniques: not only do force-vector algorithms minimize lines crossings, but they also give sense to the disposition of nodes in the space of the graph. Before spatialization, the distance between two nodes has strictly no meaning: two nodes are either connected or not, they cannot be said to be closer or further. From a mathematical point of view, the only distance in a graph is the number of edges that have to be ‘walked’ to go from a node to another. Still, this measure does not really resemble what we are used to call a ‘distance’. For one think it is discrete and for another it only has a limited number of values for a given graph6.
In a spatialized network, on the contrary, spatial distance becomes meaningful: two nodes are close if they are directly connected or connected to the same set of nodes. Because of the very logic that drives them, force-vector algorithms assure that the distance among nodes is roughly proportional to their structural equivalence, that is to say the number of neighbors that they have in common (divided by the total number of their neighbors). Spatialization deliver an amazing result, it turns the discontinuous mathematics of graphs into a continuous space. The prove is that the knots of nodes and edges designed by the uneven density of nodes in the space of a spatialized graph can be demonstrated to be equivalent to the clusters computed by mathematical calculations .
The second visual variable that <TO CONTINUE>
Graph can also be drawn in a three-dimensional or even in an N-dimensional space . However, as long as N is smaller than the number of edges in the graph, the designer will encounter the same problems we describe in the case of two-dimensional representation.↩
The best illustration of such difference can be found in the history of the underground map design. ADD THE STORY OF HARRY BECK MAP OF LONDON TUBE AND ADD SOME REFERENCE.↩
To be sure, weighted graphs exist, for which the strength of the connection between two nodes can be a continues measure Yet, the very nature of the graph will establish a crucial and binary difference between two nodes that are loosely connected and two nodes that are disconnected .↩
It is, for example, a common convention in social network analysis to arrange the nodes on a circle without this circle having whatever meaning in the interpretation of the network.↩
In fact, most force-vector algorithms add to these basic rules a set of other rules that are supposed to increase even more the legibility of the graph and that give to each algorithm its own special flavor .↩
By definition the mathematical distance varies in a graph from zero to the longest path between two indirectly connected nodes in the graph (the so-called ‘diameter’ of the graph) and then jump to infinity for nodes for which there is no connection path.↩