Analysis and visualization of emerging zoonoses through temporal networks

C. M. Rivers, PhD MPH

Network Dynamics and Simulation Science Laboratory, Virginia Bioinformatics Institute, Virginia Tech

I present case tree plots and checkerboard plots for visualizing contagions. The visualizations are best suited for diseases like SARS, MERS-CoV and H7N9 for which there are a limited (less than 200) number of cases, with data available on human to human transmission. They a) allow for easy estimation of epidemiological parameters like basic reproduction number b) indicate the frequency of introductory events, e.g. spillovers in the case of zoonoses c) represent patterns of case attributes like patient sex both by generation and over time.


Zoonoses represent an estimated 58% of all human infectious disease pathogens, and 73% of emerging infectious pathogens (Woolhouse 2005). Careful tracking of zoonotic disease is a major focus of global public health protection strategy. Recent examples of zoonotic outbreaks include Severe Acute Respiratory Syndrome (SARS), H1N1, and Middle East Respiratory Syndrome (MERS-CoV), which have caused thousands of deaths combined (Christian 2004, Domínguez-Cherit 2009, World Health Organization 2014). Early identification of new outbreaks is critical to successful containment of these diseases.

The current toolkit for visualizing data from these emerging diseases is limited. One popular option is the the epidemic curve, which is a histogram of new cases over time. Epidemic curves are limited in that they do not indicate how cases are related to one another, nor can they represent the presence of an animal source. Network diagrams are a useful though less popular option. These diagrams can depict individual human clusters, but often do not have a time component, and cannot represent constellations of unconnected clusters. Furthermore, network diagrams typically require complete information about the structure of the transmission tree. Here we introduce case tree plots and checkerboard plots to address those weaknesses and more clearly represent zoonotic outbreaks.


I present two new visualizations, case tree plots and checkerboard plots, for visualizing emerging zoonoses. Code for the plots are available in the open source python package epipy, which is available on github at The documentation is avaliable at Epipy relies heavily on the networkx (Hagberg 2008) and pandas (McKinney 2010) packages. In addition to the visualizations introduced here, epipy includes a number of functions for common epidemiology calculations, like odds ratio and relative risk. A function that generates realistic example data is also provided. All plots, data and tables in this manuscript were generated using epipy.

Case tree plots

Case tree plots depict the emergence and growth of clusters of disease over time. Each case is represented by a colored node. Nodes that share an epidemiological link are connected by an edge. The color of the node varies based on the node attribute chosen by the plot creator; in many cases, color simply signifies membership to a human to human cluster. However, it could also represent health status (e.g. alive, dead), the sex of the patient, or any other categorical attribute.

Node placement along the x-axis corresponds with the date of illness onset for the case. When the onset date is not known, diagnosis date may be used instead. The y-axis value represents the case generation. Nodes at generation zero are human cases acquired from an animal source. If that infected human passes the disease to two other humans, those two subsequent cases are plotted at generation one. Cases that do not belong to a cluster are not represented on the plot.

To produce a case tree plot, users provide a line list with, at minimum: unique case identifiers, the date of illness onset (or the date the illness was reported, if onset date is not available), and cluster membership, as seen in table \ref{tab:linelist}. Any additional relevant variables like patient age and sex may also be included.

An example line list for case tree plot construction
Case ID Onset date Cluster ID
1 2013-01-20 FamilyA
2 2013-01-29 FamilyA
3 2013-02-10 HighSchool
4 2013-02-12
5 2013-02-08 Family A
6 2013-02-14 HighSchool
7 2013-02-22 High School


Users must also provide the mean and standard deviation of the generation time between cases. Because generation time is not always known in the early days of the outbreak, the incubation period may be a reasonable proxy. The line listing need not specify the chain of transmission; the plot generator will estimate the chain of transmission based on the onset dates. Cases labeled as belonging to the same cluster that have an onset date within one standard deviation of the mean generation time are assumed to be linked.