Serendipity in computational systems

\label{sec:computational-serendipity}

The 13 facets of serendipity from Section \ref{sec:by-example} specify the conditions and preconditions that are conducive to serendipitous discovery. Section \ref{sec:our-model} distilled these elements into a computational model. Along with clear criteria, it is important to clearly delineate the scope of the system being evaluated, and the position of the evaluator (recalling that “embedded evaluation” is a requisite part of a serendipitous system). For example, a standard spell-checking program might suggest a substitution that the user deems especially fortuitous; and we might agree that serendipity has occurred, but we would not locate the potential for serendipity in the spell-checker itself, but rather to the “cyborg” system comprised of the user plus the machine and its software.

used a simpler variant the SPECS criteria to analyse three examples of potentially serendipitous behaviour: dynamic investigation problems, model generation, and poetry flowcharts. Using our updated criteria, we discuss two new examples below, and revisit poetry flowcharts, reporting on recent work and outlining the next steps. The three case studies respectively apply the criteria to evaluate an existing system, design a new experiment, and frame a “grand challenge.” In the first case study, the system we evaluate turns out not to be particularly serendipitous according to our criteria. This helps to show that our definition is not overly inclusive. The second example combines retrospective and prospective positions, as it integrates design and prototyping. As Campbell writes, “serendipity presupposes a smart mind,” and each of these examples suggest potential directions for further work in computational intelligence.

Case Study: Evaluation of an existing evolutionary computing system

\label{sec:evomusic}

System description

reported a computational jazz improvisation system (later given the name GAmprovising \cite{jordanous:12}) that uses genetic algorithms. Reevaluating GAmprovising can shed light on the degree to which evolutionary computing can encourage computational serendipity.

GAmprovising uses genetic algorithms to evolve a population of Improvisors. Each Improvisor is able to randomly generate music based on various parameters such as the range of notes to be used, preferred notes, rhythmic implications around note lengths and other musical parameters, see \cite{jordanous10}. These parameters are what define the Improvisor at any point in the system’s evolution. After a cycle of evolution, each Improvisor is evaluated using a fitness function based on Ritchie’s formal criteria for creativity. This model relies on user-supplied ratings of the novelty and appropriateness of the music produced by the Improvisor to calculate 18 metrics that collectively indicate how creative the system is. The fittest Improvisors are used to seed a new generation of Improvisors, through crossover and mutation operations.

Application of criteria

The GAmprovising system can be said to have a prepared mind through its background knowledge of what musical concepts to embed in the Improvisors and the evolutionary abilities to evolve Improvisors. At any given step, the system’s trigger comprises the combination of previous mutation and crossover operations, and current user input. To be clear, in the current version of the system a human evaluator is largely responsible for the system’s focus shift, since the user tells the system which improvisations are most valuable, metaphorically drawing a circle around some of the generated examples and saying “more like this, please.” notes that this “introduces a fitness bottleneck.” In future versions of the system, autonomous evaluation could potentially take over for the human evaluator. Once the interesting samples have been collected (from whatever source), a bridge is then built to new results through the creation of new Improvisors. The results are the various musical improvisations produced by the fittest Improvisors (as well as, perhaps, the parameters that have been considered fittest).

The probability of encountering any particular pair of Improvisor and user evaluation is vanishingly low, given the massive dimensions of this search space. However, there will always be some highest-scoring Improviser, whose parameters will be used to seed the next round. accordingly, the chance of a trigger appearing to the system is “high.” The uniqueness of the trigger is not particularly important. The evolution of Improvisors captures a sense of the system’s curiosity about how to satisfy the musical tastes of the human user. The sagacity of the system corresponds to its methods for enhancing the likelihood that the user will appreciate a given Improvisor’s music (or similar music) over time. With little basis for comparison, we can only say that these two dimensions are “typical.” The aim of the system is to maximise the value of the generated results by employing a fitness function. Indeed, the system:

[W]as able to produce jazz improvisations which slowly evolved from what was essentially random noise, to become more pleasing and sound more like jazz to the human evaluator’s ears” \cite{jordanous10}.

Ruling

The very reliability of the system ultimately bears against its overall potential for serendipity. Following Step 2, Part B of the SPECS procedure, we find a likelihood measure of \(\mathit{high}\times\mathit{moderate}\times\mathit{moderate}\), with outcomes of moderate value, so that the system as a whole is “not very serendipitous.” Note that evaluating individual threads as members of the population of all threads would yield more varied results. However, in the version of the system under discussion, individual threads are effectively equivalent regarding the features of chance, curiosity, and sagacity. The only thing to distinguish them from one another is their value. Referring to the moderate-at-best likelihood measure, even those threads which maximise value cannot be regarded as particularly serendipitous.

Qualitative assessment

The GAmprovising system does operate in dynamic world, assuming that the user’s tastes may change. A more elaborate version of the system that could cater to multiple users is not yet implemented, but would be occupied with a considerably more complex problem, spanning and integrating multiple contexts. Even the current version of the performs multiple tasks, but it uses one global fitness function; it would be more convincing if the fitness function evolved to match the user’s taste. Multiple influences are present but currently only at compile time, in the design of the fitness function, and at run time with settings for musical parameters. Greater dynamism in future versions of the system would be likely to increase its potential for serendipity.

Case Study: Iterative design in automated programming

\label{sec:flowchartassembly}

System description

Here we consider the design of a contemporary experiment with the FloWr flowcharting framework \cite{colton-flowcharting}. FloWr is a tool for creating and running computational flowcharts, built of small modules called ProcessNodes. For day-to-day user, FloWr functions as a visual programming environment. However, it can also be invoked programmatically, on the Java Virtual Machine, or with any language using a new web API. The goals of FloWr are both to be a user friendly tool for co-creativity, and to be an autonomous Flowchart Writer. Our experiment targets the latter scenario, assembling available ProcessNodes into flowcharts automatically. This can be viewed as a simple example of automated programming.

In the backend, FloWr’s flowcharts are stored as scripts. These detail the names of the involved nodes, together with their (input) parameters and (output) variable settings. Connections between nodes are established when one node’s input parameter references the output variable of another node. Inputs and outputs have constraints. For instance, the WordSenseCategoriser node has a stringsToCategorise parameter, which needs to be seeded with an ArrayList of strings. The node produces useful output only when these strings can be parsed as a space-separated list of words. Similarly, the node’s requiredSense parameter needs to be seeded with a string that represents one of the 57 British National Corpus Part of Speech tags. Given constraints of this nature, the first challenge in automated flowchart assembly is to match inputs to outputs correctly, and to make sure that all required inputs are satisfied.

Application of criteria

In the initial experimental design, following , the system’s potential triggers result from random, but constrained, trial and error with flowchart assembly. Some valid combinations of nodes will produce results, and some will not. Due to the dynamically changing environment (e.g., updates to data sources like Twitter) some flowcharts that did not produce results earlier may unexpectedly begin to produce results. The system’s prepared mind lies in a distributed knowledge base provided by the ProcessNodes, which provide metadata that describe constraints on their inputs and outputs – and also in the global history of successful and unsuccessful combinations. The system will not try combinations that it knows cannot produce results, but it will try novel combinations and may retry earlier flowchart specimens that have the chance to become viable. Turning a collection of nodes for which no known working combination existed into a working flowchart is an occasion for a focus shift. What made this particular combination work? Is there a pattern that could be exploited in the future? It may be that no broader pattern can be found, and the system will simply record the bare fact that the combination works (and this is the simple starting point from which we begin). Successful combinations and any further inferences are stored, and referred to in future runs. The bridge to a new result is accordingly found by informed trial and error, building on previous outcomes. The basic result the system is aiming to achieve is simply to generate a new combination of nodes that can fit together and that generates non-empty output. Reviewing this design, we observed that subsequent versions of the system may have more detailed evaluation functions, setting a higher bar for what counts as success. For example, a future version of the system could be tuned to search for flowcharts that generate poetry \cite{corneli2015computational}.

The chance of finding a novel successful flowchart in any given sample of nodes is fairly low. Compared to humans users of FloWr, the search process is exceptionally curious, since it tries many combinations programmatically. However, remembering viable combinations and avoiding combinations that are known not to work does not require exceptional sagacity. At least, this will be so until the system learns more heuristics for flowchart construction, which would require not only pattern matching but pattern induction. At the moment, the system’s criterion for attributing value is simply that the combination of nodes generates non-empty output; an third-party is not likely to judge such combinations as useful.

Ruling

The associated likelihood score is \(\mathit{low}\times\mathit{low}\times\mathit{high}\), which is relatively favourable. However, until there is a more discriminating way to judge value, the attribution of serendipity to any particular run seems premature. This motivates a new set of experiments that seeks to meaningfully judge the value of explanatory heuristics, generated flowcharts, and texts. This will both result in and require increased sagacity on the part of the system.

Qualitative assessment

The system operates in a dynamic world that is dynamic in two ways: first, in the straightforward sense that some of the input sources, like Twitter, are changing; additionally, in the sense that the system’s knowledge of successful and unsuccessful node combinations changes over time as well. The current version of the system does not seem to deal with multiple contexts. In a future version of the system, interaction between different heuristically-driven search processes would be possible, and could lead to more unexpected results. Along these lines, as more goals are added, the system could more readily be seen to have multiple tasks. For instance, one search process could look for narrative outlines, and another process could look for lines or stanzas to fill out that outline. As for multiple influences, the population of ProcessNodes will constrain (and, as more nodes are added, extend) the possible strategies for assembling flowcharts. In addition to this localised knowledge, a pool of heuristics for matching and authoring patterns would add to the system’s sagacity. Heuristics for evaluating output are another place where domain-specific knowledge can be brought to bear.

Case Study: Envisioning artificially intelligent recommender systems

\label{sec:nextgenrec}

System description

Recommender systems are one of the primary contexts in computing where serendipity is currently discussed. In the context of the current recommender system literature, ‘serendipity’ means suggesting items to a user that will be likely to introduce new ideas that are unexpected, but thar are close to what the user is already interested in. These systems mostly focus on supporting discovery for the user – but some architectures also seem to take account of invention of new methods for making recommendations, e.g. by using Bayesian methods, as surveyed in \citeNP{shengbo-guo-thesis}. Current recommendation techniques that aim to stimulate serendipitous discovery associate less popular items with high unexpectedness \cite{Herlocker2004,Lu2012}, and use clustering to discover latent structures in the search space, e.g., partitioning users into clusters of common interests, or clustering users and domain objects \cite{Kamahara2005,Onuma2009,Zhang2011}. But even in the Bayesian case, the system has limited autonomy. A case for giving more autonomy to recommender systems can be made, especially in complex and rapidly evolving domains where hand-tuning is cost-intensive or infeasible. This suggests the need to distinguish serendipity that the recommender induces for the user from serendipity that user behaviour induces in the system.

Application of criteria

With this challenge in mind, we ask how serendipity could be achieved within a next-generation recommender system. In terms of our model, current systems have at least the makings of a prepared mind, comprising both a user- and a domain model, both of which can be updated dynamically. User behaviour (e.g. following certain recommendations) or changes to the domain (e.g. adding a new product) may serve as a potential trigger that could ultimately cause the system to discover a new way to make recommendations in the future. In the current generation of systems that seek to induce serendipity for the user, the system aims to induce a focus shift by presenting recommendations that are neither too close, nor too far away from what user already knows. Here the flow of information is the other way around. Note, however, that it is unexpected pattern of behaviour in aggregate, rather than a one-off event, that is likely to provide grounds for the system’s focus shift. A bridge to a new kind of recommendation could be created by looking at exceptional patterns as they appear over time. For instance, new elements may have been introduced into the domain that do not cluster well, or a user may suddenly indicate a strong preference towards an item that does not fit their preference history. Clusters may appear in the user model that do not have obvious connections between them. A new recommendation strategy that addresses the organisation’s goals would be a valuable result.

The system has only imperfect knowledge of user preferences and interests. At least relative to current recommender systems, the chance of noticing some particular pattern in user behaviour seems quite low. The urge to make recommendations specifically for the purposes of finding out more about users could be described as curiosity. Such recommendations may work to the detriment of user satisfaction – and business metrics – over the short term. In principle, the system’s curiosity could be set as a parameter, depending on how much coherence is permitted to suffer for the sake of gaining new knowledge. Measures of sagacity would relate to the system’s ability to develop useful experiments and draw sensible inferences from user behaviour. For example, the system would have to select the best time to initiate an A/B test. A significant amount of programming would have to be invested in order to make this sort of judgement autonomously, and currently such systems are beyond rare. The value of recommendation strategies can be measured in terms of traditional business metrics or other organisational objectives.

Ruling

In this case, we compute a likelihood measure of \(\mathit{low}\times\mathit{variable}\times\mathit{low}\), with outcomes of potentially high value, so that such a system is “potentially highly serendipitous.” Realising such a system should be understood as a computational grand challenge. If such a system was ever realised, to maintain high value, continued adaptations would be required. If there was a population of super-intelligent systems along the lines envisioned here, the likelihood measures would have to be rescaled accordingly.

Qualitative assessment

Recommender systems have to cope with a dynamic world of changing user preferences and a changing collection of items to recommend. A dynamic environment which exhibits some degree of regularity represents a precondition for useful A/B testing. The system’s multiple contexts include the user model, the domain model, as well as an evolving model of its own organisation. A system matching the description here would have multiple tasks: making useful recommendations, generating new experiments to learn about users, and improving its models. In order to make effective decisions, a system would have to avail itself of multiple influences related to experimental design, psychology, and domain understanding. Pathways for user feedback that go beyond answers to the question “Was this recommendation helpful?” could be one way make the relevant expertise available.

p1.4in@p1.4in@p1.4in & &


Driven by (currently, human) evaluation of samples & Find a pattern to explain a successful combination of nodes & Unexpected behaviour in the aggregate
 


Previous evolutionary steps, in combination with user input & Trial and error in combinatorial search & Input from user behaviour

Musical knowledge, evolution mechanisms & Constraints on node inputs and outputs; history of successes and failures & Through user/domain model

Newly-evolved Improvisors & Try novel combinations & Elements identified outside clusters

Music generated by the fittest Improvisors & Non-empty or more highly qualified output & Dependent on organisation goals
 


Looking for rare gems in a huge search space & Changing state of the outside world; random selection of nodes to try & Imperfect knowledge of user preferences and behaviour

Aiming to have a particular user take note of an Improvisor & Search for novel combinations & Making unusual recommendations

Enhance user appreciation of Improvisor over time, using a fitness function & Don’t try things known not to work; consider variations on successful patterns & Update recommendation model after user behaviour

Via fitness function (as a proxy measure of creativity) & Currently “non-empty results”; more interesting evaluation functions possible & Per business metrics/objectives
 


Changes in the user tastes & Changing data sources and growing domain knowledge & As precondition for testing system’s influences on user behaviour

Multiple users’ opinions would change what the system is curious about and require greater sagacity & Interaction between different heuristic search processes would increase unexpectedness & User model, domain model, model of its own behaviour

Evolve Improvisors, generate music, collect user input, carry out fitness calculations & Generate new heuristics and new domain artefacts & Make recommendations, learn from users, update models

Through programming of fitness function and musical parameter combinations & Learning to combine new kinds of ProcessNodes & Experimental design, psychology, domain understanding

Summary

Table \ref{caseStudies} summarises how the condition, components, dimensions and factors in our model of serendipity appear in an evolutionary music system, in hypothetical “next-generation” recommender systems, and in our current work on a flowchart-assembly system. Each of the case studies shows clear potential for serendipity. There are also clear ways in which the measure of serendipity could be enhanced.

  1. A future version of the evolutionary music system would be more convincingly sagacious if it could evaluate works without user intervention. It might also be able to tailor its fitness function to the individual user. More broadly, interaction between the system’s tasks and more dynamism in its influences would help differentiate individual threads or system runs. Some elements of this population might be deemed more serendipitous than others.

  2. The flowchart assembly process would need more stringent, and more meaningful, criteria for value before third-party observers would be likely to attribute serendipity to the system. In addition to raising challenges for autonomous evaluation (as in the evolutionary music system case), this requirement would impose more sophisticated constaints on processing in earlier steps, which would require the system to be more sagacious.

  3. The next-generation recommender systems we’ve envisioned need to be able to make inferences from aggregate user behaviour. This points to long-term considerations that go beyond the unique serendipitous event. How “curious” should these systems be? One obvious criterion is that short-term value should be allowed to suffer as long as expected value is still higher. The symmetry between serendipity on the user side, and serendipity on the system side might be exploited. Current systems seek to induce serendipity by making use of implicit connections between clusters, resulting in an update to the user’s conception of the item space. In current recommender systems, the user is given the responsibility to form the bridge, even when triggered by the system. As a preliminary step towards building an artificially-intelligent recommender system, users might be explicitly given tasks that are designed to trigger serendipity on the system-side.