Guided variable selection

When selecting variables, a dialog appears with four source types: (1) study and treatment variables, (2) subject characteristics, (3) time points, and (4) the four assays. To help users who are unfamiliar with the sources, the DataSpace defaults to recommended variables within each one. With the help of investigators, product team experts chose default variables with the most utility to the most people. Assays are more complex. The names of the assays are familiar to all investigators we worked with, but the names hide surprising diversity in how the assays work. The value needed to plot a data point is based on unique combinations of multiple independent variables, similar to database table keys. A measure might be made of multiple properties on multiple cell types stimulated by multiple antigens. Investigators are often experts in one area but not in all such details from every assay. To help them get quick value, the variable selector defaults all of these choices to those of broadest utility and exposes the hierarchy to let interested or expert investigators explore the relationship of one setting to another (Figure 3). This schema was custom-made to keep choices together under familiar assay names.

Custom log scales

Axes default to log or linear scales according to a hidden property attached to each variable. Many measurements are very sensitive over multiple orders of magnitude and default to log scales. A problem arises from the standard data processing that includes subtraction of ”background” or unstimulated levels of the measure, creating a few values of zero or less that have no valid log transform. If these data were left out, users would not know why they have fewer subjects and data than expected either in the plot or in the filter. We created a discontinuity in the axis where affected data points lie. They are still placed according to their value on the opposing axis or otherwise jittered to help perceive density, and they are still useful for selection and filtering (Figure 4a).

Gutter plots

When correlating results from two assays run on the same subjects, the data points are matched on x and y by the time that samples were collected. Often a variable will have multiple results at every time point for every subject, for example against different antigens, creating a many-to-many relationship between axes. In this case, the Plot automatically shows median values for a match. However, assays can also be run on samples collected at times that only partially overlap. In this case, the Plot presents matched data in the normal correlation area and creates one or two ”gutter plots” in the