Introduction
Data in its raw form is difficult to interpret at scale. Beyond a small number of numerical values, it becomes difficult to draw even simple conclusions from datasets. Where data run to hundreds of thousands, millions or even billions of datapoints, the process of visualisation becomes both conceptually and technically challenging. The emergence of ubiquitous and accessible 'Big Data' has transformed traditional methods of visualisation and forced the adoption of novel techniques, with mixed success. The primary goal of visualisation is the derivation of meaning from data - data without insight is noise.
The following essay comprises a written justification for the visual design of the accompanying poster. A brief orientation to the science of visual design will serve to introduce the topic, framed in the context of the concomitant graphic. The sections that follow will address the overall layout as well as the specific plots used with this design.
The Data
The STATS19 dataset provide data relating to personal injury accidents on public roads in Great Britain covering the years 2005 to 2016 (at time of writing). Data in an altered form can also be obtained dating to 1979. The statistics relate only to those accidents reported to the police - the data should be interpreted in the knowledge that a considerable number of non-fatal accidents are not reported to the police, therefore the proportion of fatal accidents will be over-represented in the data. Figures relating to death refer to persons killed immediately or who died with 30 days of the accident. The primary dataset, in tabular form, consists of ~150,00-200,000 rows, with ~30 columns per year of recording. Each row corresponds to a single accident. Two accompanying datasets with details relating to the casualty and the automobile involved are matched on a unique primary key - Accident Index. The variables are both continuous (Date, Age) and Categorical - predominantly ordinal data.
Principles of good data visualisation
In their classic 1984 paper, Cleveland and McGill1 attempted to move the field of data visualisation onto a scientific foundation. They proposed that the phenomenological approach to the new science of graphical perception should be based on the elucidation of 'elementary perceptual tasks' that are performed by the consumer of visual data. These tasks should then be studied to establish optimal means of conveying data. While attempts to formalise a theory of graphical methods had been made prior to this, it is since this period that the area of 'data vis' has been studied as an academic discipline.
image: cleveland and mcgill
Fig 1. illustrates the ten elementary perceptual tasks outlined in this paper that humans use to extract quantitative information from graphs. Where possible I have adhered to the principles developed over the past thirty years in perceptual science. In deviating from these, I have attempted to justify the reasoning for doing so.