Principles of good data visualisation

Why do we visualise data?

Statistical methods can elucidate relationships, provide new insights and confirm or disprove hypotheses without the need for 'data vis'. So why is this process an essential part of the (analytics) workflow.
The primary dataset, in tabular form, consists of ~150,00-200,000 rows, with ~30 columns. Two accompanying datasets with details relating to the casualty and the automobile involved are matched on a unique primary key present in each dataset.
The poster layout was chosen as a compromise between essential text, visual (appeal) and clarity. The broad nature of the analysis required a significant proportion of the poster to contain explanatory text orienting the reader to detail of the analysis. 
The Title and Subtitle; 'Accident Hotspots: Seven Years of Road Traffic Accidents in Swansea', was chosen as it is brief and draws interest without ambiguity. The layout is portrait and split into thirds, with the reader encouraged to proceed left to right in three stages. The visual cues
1. Explain data to solve a specific problem.
Application in poster
---
2. Explore Large data sets for better understanding
 

Preattentive Attributes and Analytical Patterns

In their classic 1984 paper, Cleveland and McGill attempted to move the field of data visualisation onto a scientific foundation. They proposed that the phenemological approach to the new science of graphical perception should be based on the (delineation) of the 'elementary perceptual tasks' that are performed by the consumer of visual data. These tasks should then be studied to establish optimal means of conveying data.  While attempts to formalise a theory of graphical methods had been made prior to this, it is since this period that the (area of 'data vis') has been studied as an academic discipline...  \ref{486097}
Such work
Issues of visualising big data - capturing many data points on a single chart.

Radial Bar Plot and Regular Bar plot

In terms of accuracy of observer estimates, utilising length, height or position in plotting is generally preferred over area, angle, weight or colour. This is underpinned by evidence from visual science which has demonstrated that position along common scales: as seen in bar and scatter plots, is more easily interpreted than slope, direction, angle, area.
The bar plot sits at the top of the hierarchy of elementary perceptual tasks (include image). The Week in Accidents (fig 2.) bar plots was chosen for simplicity and easy of interpretation. A rapid analysis of differences in days, and the change from 2009 to 2016 is facilitated by this plot. It follows logically from the calendar heatmap seen above and is drawn from the same data. The paired bars were used as they are more easily interpreted than stacked columns and emphasise that reduction in the rate of accidents both per day and across the entire week. The minimalist theme avoids distraction without compromising information, and the color pallette aligns with the rest of the poster.
In contrast to the bar plot, the radial column plot that sits in the centre third of the poster was chosen for a number of reasons despite its limitations (in terms of perception). Maps based on angle and length , in this case the angle of the bar and its location in the circle represent time and the length represent the variance present within this bin, sit within the bottom half of the hierarchy. This may appear a poor choice, where a traditional time series graphic may aid ease interpretation (include image). However, methods of visualisation based on (lower) parameters may be appropriate when the aim is not to facilitate precise judgements in data interpretation but to (reveal) general patterns of data relating to the dimensions of the data. The radial map resembles a traditional clock (analogue) face, and is therefore an extremely familiar to those viewing it. The decoding required by the observer is simple and corresponds with a learned skill from early infancy (telling the time). The graphic emphasises the peak periods of road traffic accidents around 9 am and 5pm. The colour scheme also serves to highlight the differences from day to night, with two transition periods in late evening and early morning.
It is worth noting that the binning here is in 30 minute aggregates. This arose out of necessity and due to the sampling method in the STATS19 dataset. The original chart was designed as a minute by minute visualisation of 24 hours i.e. 24 * 60 columns. When charted, it revealed a cyclical variation in accident occurrence which at first is difficult to interpret. A large number of accidents corresponded with whole divisions of the hours of the day i.e. on half hour intervals. With smaller numbers occurring within these periods (see graphic). Without separate evidence to corroborate this finding, that accidents are more common on half hour or hour times throughout the day, it would appear that those individuals collecting the data were 'rounding -up or -down' in the recording of the time of event. So accidents that occurred at 11:57 for example, were disproportionately recorded as 12:00. The thirty minute interval bins aggregated these together and served to limit the column to column variability without affecting the overall accuracy of the graph.
It should also be noted that a log scale was used in the creation of this graph. The wide variation in accident number was difficult to plot on a linear scale 
http://jtleek.com/jhsph753and4/lectures/05_01_exploratoryAnalysis/#14
importance of scale - use of log scale in radial bar chart.
Effect of scale on appearance of correlation of variables.

Calendar Heatmap

Most Calendars are laid out in tabular format, with days read from left to right, top to bottom. This traditional structure has no intuitive visual relationship with the unit of time that it represents - the month, week, day. However, it is so pervasive that most individuals would have no problem in orienting themselves to a calendar without any direction or cues. Much like the clock face in the radial bar example, an intuitive understanding of calendar structure allows its form to be exploited in graphing variations where day, week and month differences are important. Here trends in the data are more important than precision, and permit a rapid orientation to the graphs take-home message. It allows the observer to 'relate' to the message in a manner that