Authorea

Mathieu Jacomy edited mapification of networks.tex over 10 years ago

Commit id: f49e1de02241e9492fb1755dbbf7bf724f4f03fe

deletions | additions

\section{Position, size, color and the mappification of networks} Of these variables, the first one is by far the most important. Like geographical maps, graphs are two-dimensional\footnote{Graph can also be drawn in a three-dimensional or even in an N-dimensional space . However, as long as N is smaller than the number of edges in the graph, the designer will encounter the same problems we describe in the case of two-dimensional representation.} representations, but unlike maps they cannot rely on a predefined set of projection rules. In a geographical representation, the space is defined a priori by the way the horizontal and vertical axes are constructed. Points are projected on such pre-existing space according to a set of rules that assign them a pair of coordinates and thereby a univocal position. The same is true for any Cartesian coordinate system, but not for network graphs\footnote{ The best illustration of such difference can be found in the history of the underground map design. \textbf{ADD THE STORY OF HARRY BECK MAP OF LONDON TUBE AND ADD SOME REFERENCE.}}. \textit{Nothing in network data predetermines where nodes have to be located in the graph. This has to do with the essentially discrete nature of graphs. Unlike geographical maps, graphs do not represent a continuous phenomenon (such as the distance between two landmarks), but a discrete one: two nodes are either connected or not\footnote{To be sure, weighted graphs exist, for which the strength of the connection between two nodes can be a continues measure Yet, the very nature of the graph will establish a crucial and binary difference between two nodes that are loosely connected and two nodes that are disconnected .}. Therefore, as long as the edges are correctly drawn and link nodes that are connected in the dataset, nodes can assume whatever position without affecting the way the graph is read\footnote{It is, for example, a common convention in social network analysis to arrange the nodes on a circle without this circle having whatever meaning in the interpretation of the network.}.} Of course, this is true in theory, but not in the practice of drawing graphs. As soon as Moreno and his followers started handling large graphs, they discovered that some ways of positioning nodes could make their sociogram easier to read. Notably, Moreno himself enunciated the main rule still in use today to draw clearer graphs: "the fewer the number of lines crossing, the better the sociogram" (1953, p.141). Easy to follow when working on graphs of a few dozens of nodes and edges, Moreno’s precept becomes impossible to implement directly on larger networks. Graphs with hundreds of nodes and edges have thousands of lines crossing: how to even know if moving a node increase or decrease the crossings? Since direct implementation of Moreno’s precept is impossible, one can try an indirect approach: drawing closer the points that are connected minimizes the length of the lines and therefore the possibility of crossing. But, even so, since each node is normally connected to several other nodes that are themselves connected to other nodes, minimizing the length of connection is far from being trivial. The solution to this problems is so complex and computational demanding, that it could not be found until network visualization migrated from paper to pixels. The solution is a spatialization technique that came to be known as “force-vector algorithms”. A force-vector algorithm works following a physical analogy: nodes are given a repulsive force that drives apart, while edges work as springs bounding the nodes that they connect. Once the algorithm is launched it changes the disposition of nodes until reaching the equilibrium that guarantees the best balance of forces. Such equilibrium minimizes the number of lines crossings and thereby maximizes the legibility of the graph\footnote{ In fact, most force-vector algorithms add to these basic rules a set of other rules that are supposed to increase even more the legibility of the graph and that give to each algorithm its own special flavor .}. There is, however, a most interesting by-product of such visualization techniques: not only do force-vector algorithms minimize lines crossings, but they also give sense to the disposition of nodes in the space of the graph. Before spatialization, the distance between two nodes has strictly no meaning: two nodes are either connected or not, they cannot be said to be closer or further. From a mathematical point of view, the only distance in a graph is the number of edges that have to be ‘walked’ to go from a node to another. Still, this measure does not really resemble what we are used to call a ‘distance’. For one think it is discrete and for another it only has a limited number of values for a given graph\footnote{By definition the mathematical distance varies in a graph from zero to the longest path between two indirectly connected nodes in the graph (the so-called ‘diameter’ of the graph) and then jump to infinity for nodes for which there is no connection path.}. In a spatialized network, on the contrary, spatial distance becomes meaningful: two nodes are close if they are directly connected or connected to the same set of nodes. Because of the very logic that drives them, force-vector algorithms assure that the distance among nodes is roughly proportional to their structural equivalence, that is to say the number of neighbors that they have in common (divided by the total number of their neighbors). Spatialization deliver an amazing result, it turns the discontinuous mathematics of graphs into a continuous space. The prove is that the knots of nodes and edges designed by the uneven density of nodes in the space of a spatialized graph can be demonstrated to be equivalent to the clusters computed by mathematical calculations . The \textbf{The second visual variable that }