Guided variable
selection
When selecting variables, a dialog appears with four source types: (1)
study and treatment variables, (2) subject characteristics, (3) time
points, and (4) the four assays. To help users who are unfamiliar with
the sources, the DataSpace defaults to recommended variables within each
one. With the help of investigators, product team experts chose default
variables with the most utility to the most people. Assays are more
complex. The names of the assays are familiar to all investigators we
worked with, but the names hide surprising diversity in how the assays
work. The value needed to plot a data point is based on unique
combinations of multiple independent variables, similar to database
table keys. A measure might be made of multiple properties on multiple
cell types stimulated by multiple antigens. Investigators are often
experts in one area but not in all such details from every assay. To
help them get quick value, the variable selector defaults all of these
choices to those of broadest utility and exposes the hierarchy to let
interested or expert investigators explore the relationship of one
setting to another (Figure 3). This schema was custom-made to keep
choices together under familiar assay names.
Custom log scales
Axes default to log or linear scales according to a hidden property
attached to each variable. Many measurements are very sensitive over
multiple orders of magnitude and default to log scales. A problem arises
from the standard data processing that includes subtraction of
”background” or unstimulated levels of the measure, creating a few
values of zero or less that have no valid log transform. If these data
were left out, users would not know why they have fewer subjects and
data than expected either in the plot or in the filter. We created a
discontinuity in the axis where affected data points lie. They are still
placed according to their value on the opposing axis or otherwise
jittered to help perceive density, and they are still useful for
selection and filtering (Figure 4a).
Gutter plots
When correlating results from two assays run on the same subjects, the
data points are matched on x and y by the time that samples were
collected. Often a variable will have multiple results at every time
point for every subject, for example against different antigens,
creating a many-to-many relationship between axes. In this case, the
Plot automatically shows median values for a match. However, assays can
also be run on samples collected at times that only partially overlap.
In this case, the Plot presents matched data in the normal correlation
area and creates one or two ”gutter plots” in the