10 Questions to Ask When Creating a Visualization

PLEASE NOTE: This is an in-process DRAFT, not yet a publication.  Thank you for understanding.  Please email the authors if you have questions.


Abstract
Visualization is critical to the work of nearly all scientists, but not all scientists are equally adept at visualization. Software defaults and instinct often guide scientists’ visualization designs. However, not all software is optimized for visualization design, and instinct does not always guide us to good functional design choices. Most software is not developed in collaboration with design or science professionals, and software developers are not always trained in cognitive and perceptual sciences to assess the implications of their design choices on humans. Furthermore, while instinct in visualization design can produce outcomes that please a personal aesthetic, it may not lead to correctly understood and interpreted visualizations. In other words; many scientists are accidentally producing ineffective visualizations because they are not aware of the body of knowledge emerging from empirical studies on how people experience visualizations. There is a knowledge transfer gap between the knowledge in visualization research domain and those who ‘practice’ the art. This paper distills and synthesizes the literature on visualization research into (potentially) practicable rules of thumb optimized for scientists and data journalists who are not trained in visualization design. Specifically, we offer the reader a set of questions they need to ask themselves before making visualization decisions.

1. Who | Who is your audience? How expert will they be about the subject and/or display conventions?


The information and meaning any particular visualization communicates depends on its viewer. A physicist looking quickly at genomicist's graphic may be mystified, even though both scientists are “experts” in their own fields. A graphic that would satisfy an expert may or may not be appropriate for broader audiences. And, a graphic intended to explain sometimes can look very different from one intended to allow for deep exploration of data and information (Figure 1). As in other forms of communication, considering the range of expertise and interests in the audience (viewers, in this case) is paramount (e.g., Çöltekin, Fabrikant, & Lacayo, 2010). Typically, scientists err on the side of assuming too much expertise for any particular audience. Jargon is used in labels and captions, and details concerning context and metadata are omitted from the figure, and often also from a caption. In general, any audience will prefer a figure that requires the least amount of explanation outside the boundaries of the figure (e.g., Russo et al., 2014), but that does not mean that excessive “keys” or labeling are always a good idea.

[Figure 1 here]
Replace this text with your caption

The cognitive science literature consistently points at individual and group differences in spatial abilities (e.g., Hegarty, Montello, Richardson, Ishikawa, & Lovelace, 2006). This means a complex graphic may not be equally complex for everyone (or a simple graphic may not be equally simple for everyone). When personalization is not possible (which often it is not), it is possibly best to adjust the design for people with lower spatial abilities – or, as some will say in user interface (UX) design community, design as if “the user is drunk” (http://theuserisdrunk.com/). While designs that are optimized for a low-spatial population may be frustrating for the high-spatial viewers at times, or it may even slow them down a little, it should not necessarily harm communication in the long-run. It is, though, important to remember that spatial abilities can be trained, and may also involuntarily change by age. This is also true for visual abilities, e.g., approximately 8% of the male population has color deficiencies (Chan, Goh, & Tan, 2014) , and as we age, we lose some of our color receptors (Roy, Podgor, Collier, & Gunkel, 1991), and reportedly, up to 20% of the population has some difficulty in seeing stereo (Ware, 2004). If you exclusively target a particular age group (children vs. adults, professionals vs. senior citizens), or otherwise are able to assess the spatial abilities of the potential viewers, personalizing the visual for them is, of course, the most recommendable course of action.  While opportunities for true “personalization” are rare, thinking deeply about audience in creating (and/or customizing) a visualization is always possible, and desirable.  


2. Explore/Explain | Is your goal to explore, document, or explain your data or ideas, or a combination of these?

Visualizations have two distinct uses: exploration and explanation. Exploration allows us to discover what is in our data in all stages of research process and construct hypotheses. in all stages of research process and construct hypotheses. Explanation is necessary when we have learned something about a phenomenon and we want to ‘map it’ to explain it to others. Typically, exploration is a part of analysis, while explanation is a part of communication (e.g., Keim, Kohlhammer, Ellis, & Mansmann, 2010). The two processes are intertwined in some projects, but often they have different audiences. Thus, tightly coupling with the “who” question in the previous section, they often require different visualization considerations. 


For example, exploration is often (albeit not always) an expert task, therefore, one can assume that the explorer has a strong interest in the topic and therefore will take the time to digest more complex visualizations and can possibly master complex interaction designs. On the other hand, if the visualization is meant for public use (e.g., for a newspaper), this may be the opposite for majority of the viewers: they may not have the patience or motivation to do anything that requires too many steps or too much time to learn. Therefore, at the design stage, for a newspaper), this may be the opposite for majority of the viewers: they may not have the patience or motivation to do anything that requires too many steps or too much time to learn. Therefore, at the design stage, one-time users, first time users and mid- to long-term users should be distinguished, and it should be noted that even most long-term users do not appreciate a very high learning curve. 


Left: Ikea, Right: OECD Xplorer

Most modern exploratory visualization environments (see Table xx for examples) are interactive, and, as a result, users can create a wide variety of different (and often temporary) visualizations (for an example, see Figure 2). The on-the-fly nature of exploratory visualization often makes software recommendations (such as color palettes and visualization options) even more important. The non-expert user most likely will use the software defaults, but visualization experts agree that the defaults are hardly ever the best option (Driscoll, 2010; Goodsell, 2006; Healy & Moody, 2014). A default may be appropriate or pleasing in a narrow range of circumstances, but not usually broadly.  


Examples of Interactive Visualizations and Visualization Envirnoments.

NYT elections 

http://www.nytimes.com/

interactive/2014/upshot/

elections-2014-make-your-own-

senate-forecast.html?_r=

1&abt=0002&abg=1

Public

OECD Explorer

http://stats.oecd.org/

OECDregionalstatistics/#story=0

Expert and public

Gapminder

http://www.gapminder.org/

Expert and public

Glue

http://www.glueviz.org/en/stable/

Expert (astro++)

QGIS

http://www.qgis.org/en/site/

Expert (geo)

3D Slicer

http://www.slicer.org/

Expert (med)


In creating an interactive visualization environment, interaction design should not be left to intuition or software defaults alone either; e.g., minimum number of operations (e.g., clicks) where possible to reach the goal is desirable, and a system that allows “overview, zoom, details-on-demand” is recommendable (Shneiderman, 1996).

Also consider using multiple linked views that can be switched on and off as needed (instead of displaying everything in one large and crowded visualization).

Multiple linked views allow exploring the data from various perspectives, and enable the viewer to see patterns that may have been hidden in one view (e.g., 2D-3D side-by-side views can give both useful but very different information, or being able to explore the data using various clustering methods or frequencies could balance between overview and detail). Brushing and linking (or otherwise highlighting) is very highly recommended in multiple-linked view systems. However, the number of displays here should be also closely considered for the viewers’ abilities, expertise and time. It is possible to show too much information at the same time, inadvertently creating. Multiple-linked views allow exploring the data from various perspectives, and enable the viewer to see patterns that may have been hidden in one view (e.g., 2D-3D side-by-side views can give both useful but very different information, or being able to explore the data using various clustering methods or frequencies could balance between overview and detail). Brushing and linking (or otherwise highlighting) is very highly recommended in multiple-linked view systems. However, the number of displays here should be also closely considered for the viewers’ abilities, expertise and time. It is possible to show too much information at the same time, inadvertently creating information overload (Eppler & Mengis, 2004). 

When the task is explanation,  the most typical scenario is where an expert (teacher, scientist, science journalist) tries to make something clear to others (students, an interdisciplinary audience or public). In explanation tasks, a safe assumption is that the viewer (whether experts or non-experts) is not going to look at the visualization for a long time, e.g., imagine a conference talk, a lecture or reading an online newspaper article. Because of this, the viewers possibly will find excessive information on display and/or interactivity expectations frustrating. A ‘reduced’ version in which the most important points are emphasized/highlighted with salient and appropriate visual variables (e.g., color hue and value, size, position, orientation.. (Bertin, 1983)), labeled without jargon and acronyms, and where possible, no more than 5-7 ‘bits of information’ are displayed at the same time (See Figure 1) should function best (Cowan, 2001; Miller, 1956). 


3. Feature recognition | Is feature and/or pattern recognition, a goal?

A common task with visualization is visual pattern seeking and object/feature recognition (Logothetis & Sheinberg, 1996). Do you have a trend that emerges once you plot your data or draw your map? E.g., Did a lot of birds migrate in fall? Is majority of 80-year-olds getting sick when it is really hot? Are the dense gas clouds occur in certain time periods or between two large masses in space? In examples such as these, the viewers will try to attempt identifying patterns based on some criteria. Conversely, detecting anomalies and outliers is also a pattern detection task (these two are inherently linked, i.e., to be able to tell the anomaly, we would have to identify the ‘usual’). Figure 3 shows two multiple-linked view examples in which the viewers are interested in understanding if a particular set of points in the scatterplot may have a geographic/spatial explanation. 


[FIGURE xx]

As the examples in Figure *** also demonstrate, color is a commonly used (and possibly abused) visual variable to express patterns. The use of color will change everything! The number of colors we use, or whether we use sequential versus diverging colors, will change what we see in the visualization; thus what we believe about our data’s content. Even though seemingly not widely known outside the visualization community, many examples of how color choices impact visual analysis has been demonstrated. Most frequently referred example is possibly Rogowitz & Treinish’s 1996 paper (see an example in Figure 4). 


FIGURE 4

There are various other human factors to keep in mind about colors; e.g., color deficiencies, our perceptual limitations in comparing shades of color, our tendencies to see “patterns that are not there” as well as the so called information overload are real threats for the success of your visualization. 


Besides colors, pattern/feature recognition can be affected by presence of presence of shadows in your visualizations. Presence of shadow can change the way we see colors (..), thus, in such cases, seriously mislead us. The influence of shadows in pattern recognition is also very pronounced in some forms of 3D and pseudo-3D (2.5D) visualizations. If you are trying to judge 3D spatial relationships and if there is (artificial or natural) shadow in the scene, it will potentially (and strongly) affect the perception of concavity and convexity; thus may lead to misinterpretations. This effect can be observed concretely in what is termed seriously mislead us. The influence of shadows in pattern recognition is also very pronounced in some forms of 3D and pseudo-3D (2.5D) visualizations. If you are trying to judge 3D spatial relationships and if there is (artificial or natural) shadow in the scene, it will potentially (and strongly) affect the perception of concavity and convexity; thus may lead to misinterpretations. This effect can be observed concretely in what is termed relief inversion, and in terrain-like visualizations; terrain reversal effect (e.g. Bernabe & Coltekin 2014, Figure 5). The illusion reverses the convex and concave shapes in the entire scene and is closely linked to cognitive factors such as familiarity and assumption of where the light source is, and can be experienced also with faces, known as hollow mask illusion (***, Figure ***)

FIGURE 5

4. Are you making a comparison between data and/or predictions? Is representing uncertainty a concern?

(- yes, and I have quantitative information about it, - yes, but it value is not well-measured, or information is qualitative)


xx see AG comments on what’s below.  Ultimately, we need a short list of standard comparisons, such as:


  • showing a modeled curve (line or complicated function, or even contour) as a prediction on data (e.g. predicted path of falling object measured in high-school physics lab to sea level rise contour compared to actual sea levels re:climate change model)
  • complex high-dimensional model output (e.g. fake galaxies contrasted with real ones) compared with observations
  • simulated cracking behavior compared with actual cracks, etc. xx


For visual comparison of time series data, we have two main options, the so-called small multiples and animations. In case of small multiples, showing too many of them could be a problem. If it will be an animation (or several animations), all dynamic visual variables should be considered for accurate recall. There are also various illusions with animations, e.g. brain tends to fill in the missing frames (apparent motion).


For comparison and/or predictions; representing uncertainty may be especially important: the two compared phenomenon (visualization?) may have different sampling rates, different model precisions, different processes. Try to make sure you are not comparing apples to oranges and make sure that the fact that one is an apple and the other is an orange is explicitly stated in your visualization.


FIGURE

Visualizing uncertainty has been subject to some research, though there are no straightforward, set-in-stone recommendations as of this writing. Our take on this is to be visually explicit about uncertainty. You may have only qualitative information and perhaps cannot use a visual variable to represent all the uncertainty information, but in this case you can add a visually explicit symbol intrinsically (in the visualization) to guide the viewer to a legend where you provide the necessary information. 


FIGURES (2)

5. Dimensions | What is the intrinsic number of dimensions (not necessarily spatial) in your data, and how many do you want to show at once?


By “dimensions,” we do not necessarily mean spatial dimensions… xxAG explain, Arzu-you can read Goodman 2012 to know more about what I meant “non-spatially”xx

 

Many empirical studies suggest that 3D visualizations lead to more mistakes and they slow people down, even when the nature of nearly all phenomena is 3D (Borkin***, Dall’acqua****, unpublished stereo visual search task, others). This desire to stay true to the nature of the phenomenon does not always pay off (also true for continuous and discrete colors, see Dall’acqua***). The tasks that require searching, counting, comparing (such as detecting anomalies, estimating distances, reading information from a plot) appears to suffer from information overload for the short term working memory. What is not yet clearly understood is the mid-term and long term effects of the 3D and realistic visualizations for a global recognition and understanding of the represented phenomena; as well as the its impact in the mid and long term memory (e.g., learning). Some studies suggest it may be improving scores in exams e.g., in comparison to text-only studying (Mahrer & Coltekin***).


Borkin FIGURE

All of this means you need to carefully consider whether you want to use an abstract 2D representation, a plot, or a realistic 3D visualization. Even further, representing time (change) is often considered the 4th dimension in visualization literature and similarly, interactivity and animations may not be best suited to your purposes (Smallman/Hegarty*** naive ..)


Further high-dimensional data is proposed in various formats such as self-organizing maps (SOM) or tree maps, however, there are not many empirical studies regarding how successful these depictions are (***any work on Skupin’s or Dykes’ visualizations??)


Hexagons FIGURE

6. Categories & Clustering | Are there natural, or imposed, categories within the data? Are you interested in clustering?

If you are able to control the number of categories (such as in thematic maps), remember the magical number seven plus minus two (***), even three (***). This is possible with demographic data, other measurements in which you define the number of classes and classification method (e.g., natural breaks, quantiles, equal intervals) yourself. If you are not able to control the number of categories (such as in soil maps, where the number of soil types are “naturally imposed”); and you end up with many many colors on your map, consider interactive maps with some highlighting. 


FIGURES, map of World, connections

7. Abstraction & Accuracy | Do you need to show all the data, or is summary or abstraction OK? (- all, - summary (statistical), - abstraction). How literally accurate does your visualization need to be? -to-scale, -schematic, -color-realistic.

These considerations will also help with deciding what to show (and how/when). As we mentioned previously, there is such a thing as too much information on the display; thus if you can consider what is OK not to show or if the real complex object can be shown with a symbol (e.g., instead of a photorealistic representation of a cell, can you use a circle that represents the cell?) it may help reducing the load on cognitive processing in working memory. If you find a benefit of showing all the data at once or in all its realism and detail (and you might), best course of action may be using a multiple-view approach in which one window shows the highly detailed realistic representation for a “global” general feel for the entirety of the phenomenon or objects/relationships, and an abstract/summary view for more precise comparison, size (distance, area, volume) estimations or measurements. Depending on the accuracy needs (which you must assess), your visualization may have to have relative or absolute accuracy; which would call for design decisions to accommodate this. If the visualization will not be used for precise measurements (absolute or relative), you can take more liberty in emphasizing the message (e.g., “look, area A is considerable bigger/closer/more dense.. than area B”) -- though here you must pay attention not to over-do it so that the viewer is misled. Visual communication is similar to verbal communication in that sense, if your “adjectives” are too strong, you might give the wrong impression and even lead people to bad decisions [(***if you are an advertiser, we would like you to stop reading here ;)) … OR “ask an advertiser if you don’t believe us”]]. 


Figure (head)

8. Context & Scale |  Can you, and do you want to, put the data into a standard frame of reference? (- coordinate system, - registration with other data (images)). Is a single scale OK, or do you need more than one at once?


In geographic information, this question comes with a big fat “wait a minute”. If you are representing a curvy/round/spherical object in a 2D plane, you will need to be aware of distortions (***), thus need to chose your projection wisely. Spatial referencing will fix issues of location and for measurements based on underlying data, unify all data regarding that location -- but if there is a projection (and often there is, as we live in a 3D world and study 3D structures of the macro or micro universe; yet we draw things in 2D, in fact we are often encourage to do so (***)) -- there will be distortions. This spatial coordinate system you chose will also reflect when, for example, you want to add photo-textures (imagine draping aerial images over the terrain) and you might get visual artifacts from this. While sometimes we might want to prioritize the geometry and accept some trade-off in visualizations, in some cases we might very well chose the opposite. This is linked to the previous item (#8), we need to assess our accuracy needs. 


It is also interesting to think about scale -- and not only geometric scale -- it can be temporal (animations, interactive visualizations, time lapses) or attribute (multi-variate, high-dimensional data). It is good to consider if you can communicate what needs to be communicated with a single scale, or if the visualizations should be prepared for multiple scales. The amount of detail you will chose to include will be different at different scales. In cartography, the concept generalization (which is basically a set of visual operations about what to show, emphasize, group ..) is tightly coupled with scale, because scale determines what kind of “screen estate” we have (***). If you design for multiple scales, be aware of the transitions between them, and again, consider the users and the tasks to make a conscious decision if it should be interactively designed or whether to show various scales side by side.



FIGURE, projections

9. Metadata | Do you need to display or link to non-quantitative metadata? (Consider figure captions, labels and other in-text explanations etc about the data/vis)

text

10. Display modes | What display modes might be used in experiencing your display?

      • many of the below
      • paper, B&W
      • paper, color
      • phone/tablet-size screen, color
      • laptop-size screen, color
      • large screen screen, color
      • viz wall, color
      • moving (video)
      • moving (changeable, interactive)
      • 2D
      • 3D
      • stereo



If you ever designed a web page, you know that you are supposed to test it on different browsers, optimize for mobile displays and if you can, view it on multiple monitors (to see if your colors work, for example). Concerns are similar for the visualization you are about to create. A lot of your design decisions need to take it in consideration if the visualization will be shown on multiple displays and whether these will be asynchronous (i.e., will you put it on the Web), whether you imagine this being seen on a mobile display (phone, tablet), whether it may be printed (if yes, should it be optimized for black and white?). Sometimes the visualization is meant for a single person or group, and it will live on a lab’s wall. But even then you may need to think if the monitor may be stereo, tiled or multi-touch. Each hardware has a limitation (color configuration, screen size..) and what you see on your screen will not “just work” on another. 


Technical Considerations


Sources Are your data all in one file/source, or many?


Formats  Are/is your data files/file in a standard format, and/or can they/it be put into one?


Custom Code Are you interested in a custom solution, which may mean writing new code, or are you seeking a more off-the-shelf and/or GUI-based solution? Is a combination OK?


Lots of additional notes in .MSWD file--not sure whether to move them here, or to some "commentatry/discussion" document? Arzu?

Acknowledgements

The authors thank Sarah Block for her assistance in establishing the "Ten Questions" web site that accompanies this paper, and Alberto Pepe and his team for helping with the Authorea online preprint of this work.

The list (for reference)


1.  Who | Who is your audience?  How expert will they be about the subject and/or display conventions?              

2.  Explore-Explain Is your goal to explore, document, or explain your data or ideas, or a combination of these?  

3. Feature recognition | Is feature and/or pattern recognition, a goal?

4. Predictions & Uncertainty | Are you making a comparison between data and/or predictions?             Is representing uncertainty a concern?   

•   yes, and I have quantitative information about ityes, and I have quantitative information about it

•   yes, but it value is not well-measured, or information is qualitative

5.  Dimensions [2] | What is the intrinsic number of dimensions (not necessarily spatial) in your data, and how many do you want to show at once?   


6.  Categories [AC3] & Clustering[4]  | Are there natural, or imposed, categories within the data?            
Are you interested in clustering?


7. Abstraction & Accuracy
Do you need to show all the data, or is summary or abstraction OK?             

•              allallallall

•              summary (statistical)

•              abstraction

How literally accurate does your visualization need to be?

•              to-scale

•              schematic

•              color-realistic

8.  Context & Scale |

                Can you, and do you want to, put the data into a standard frame of reference?

•              coordinate system

•              registration with other data (images)

                Is a single scale OK, or do you need more than one at once?            

9. Metadata | Do you need to display or link to non-quantitative metadata? (Consider figure captions, labels and other in-text explanations etc about the data/vis)

10.  Display modes | What display modes might be used in experiencing your display?  

•              many of the below

•              paper, B&W

•              paper, color

•              phone/tablet-size screen, color

•              laptop-size screen, color

•              large screen screen, color

•              viz wall, colorviz

•              moving (video)moving (video)moving (video)moving (video)

•              moving (changeable, interactive)moving (changeable, interactive)moving (changeable, interactive)moving (changeable, interactive)

•              2D

•              3D

•              stereostereostereostereo

 

Technical Considerations

Sources Are your data all in one file/source, or many | Formats  Are/is your data files/file in a standard format, and/or can they/it be put into one? | is your data files/file in a standard format, and/or can they/it be put into one? | is your data files/file in a standard format, and/or can they/it be put into one? | is your data files/file in a standard format, and/or can they/it be put into one? | is your data files/file in a standard format, and/or can they/it be put into one? | is your data files/file in a standard format, and/or can they/it be put into one? | is your data files/file in a standard format, and/or can they/it be put into one? | Custom Code Are you interested in a custom solution, which may mean writing new code, or are you seeking a more off-the-shelf and/or GUI-based solution? Is a combination OK?


[Someone else is editing this]

You are editing this file