WORKING DRAFT authorea.com/92791
Main Data History
Export
Show Index Toggle 6 comments
  •  Quick Edit
  • LAMM Manuscript

    Hayim Dar, Paul Miller

    Abstract

    Abstract

    Introduction

    The existence, and nature, of a distinction between short and long term memory has been debated for over a century (James 1890, Jonides 2008). Associative buffer models, such as Raaijmakers and Shiffrin’s Search of Associative Memory (SAM, Raaijmakers 1981), present computational implementations of the highly influential theory of separate but interacting short and long-term memory processes (Atkinson 1968, Baddeley 1974).

    The features of SAM illustrate the canonical dual-storage algorithm: Items to be recalled are entered into a temporary storage buffer, replacing prior items if the buffer is full, and therein form episodic associations with other buffer items, and perhaps also a context marker. Those associations then serve as cues for consecutive recalls, producing many of the familiar dynamics of list recall performance (Davelaar 2005).

    The Linking via Active Maintenance Model (LAMM) aims to explore the biological feasibility of associative buffer memory theories by implementing them within simulated neural circuitry. Additionally, it advances the debate between dual- and single-memory models (Brown 2002, Howard 2002, Cowan 2008) by arguing that, when implemented neuronally, associative buffer networks incorporate temporal context dynamics **A KAHANA CITATION FOR TEMPORAL CONTEXT? (Usher 2008). Finally, LAMM’s architecture is based upon prior work showing that buffer-mediated associations are necessary to explain the effects of degraded stimuli on recall (Piquado 2010, Miller 2010, Cousins 2014).

    Word list recall

    • Is an experimental paradigm with long history as a model system for short term memory and rich behavioural phenomena.

    Recall for sequentially presented lists of items has long been a key experimental paradigm for revealing hidden structure in memory dynamics. Dependencies on list-position, presentation timing, temporal contiguity, effects of rehearsal, intra- and inter-list interference, categorical and acoustic relatedness, and scaling laws have all been observed (Murdock 1962, Rundus 1971, Kahana 1996, Golomb 2008, Grenfell-Essam 2012, Cowan 2008a, Farrell 2011) (more!). Disruptions and distractions have been used to probe the stages and modalities of memory encoding (Carroll 2010, Elliott 1998, Spataro 2013, Cowan 2008) including the finding of a retroactive effect indicating that encoding into short term memory continues beyond the removal of the stimulus (Rabbitt 1968, Cousins 2014).

    The canonical result in free-recall experiments is the 'U'-shape performance curve for recall of items as a function of presentation order: the so-called serial position effects. Recall is stronger for both early items (denoted 'primacy'), and late items ('recency'), than for middle list items. Further, shorter lists seem to favour primacy over recency, and vice versa. Distractor tasks just before recall reduce recency but nor primacy, whereas distraction throughout both presentation and recall restore somewhat the concavity of the serial-position curve (Howard 1999). On the other hand, subjects tend to overtly rehearse early list items much more than late ones; and whilst recall increases with rehearsal, late items are recalled much more easily than early ones for a given proportion of allocated rehearsal time (Rundus 1971). It thus appears that different mechanisms may support the recall of early vs late items (but see discussions in Sederberg 2008, Usher 2008).

    • several models proposed, notably differential memory for late list items key support for the idea of the short term memory buffer.

    A key point of disagreement within the literature -- and entwined in the early- vs late-items dissociation mentioned above -- is whether memory for recently perceived list items is computed by the same processes that encode and retrieve general long term memory (LTM) (the 'unitary view'), or by dedicated, temporally limited 'short-term' processes (the 'dual-store' or 'dual-process' view) (Raaijmakers 1981a, Davelaar 2005, Sederberg 2008). Generally, unitary models have relied on representations of distinctiveness between items to be remembered, which contain some invariance to timsecales (Brown 2002, Howard 2002), whilst dual-process models have posited a separate, short-term, actively maintained or reverberatory storage, or 'buffer' (Baddeley 1974, Raaijmakers 1981, Davelaar 2005) (more!). More recently, structural similarities have been pointed out between actual computational implementations of the different theories (Usher 2008), and modeling has suggested that both distinctiveness and buffer mechanisms may be necessary (Piquado 2010, Miller 2010, Cousins 2014). In most cases such short-term processes are assumed to contribute most strongly to memory for only very recently perceived stimuli, such as the ends of a word list, so that theoretical disagreement focuses on how these 'recent' items are remembered. On the other hand, few models provide an integrated account of the primacy effect, and even fewer still a biological one (Lansner 2013, Farrell 2012), and our model is guilty of this shortcoming. We argue that primacy should be a focus of future work in this area.

    Memory theories, summary of trends

    • Buffers (active maintenance) vs similarity (interference)

    The debate regarding the mechanisms operating in tasks of list recall is one front in the larger debate about short-term memory in general: a debate that contrasts buffer maintenance with attention-driven theories. Buffer theories posit that the most recently perceived stimuli are maintained in short term memory buffers via active mechanisms (Baddeley 1974, Baddeley 2010, Davelaar 2005, Grossberg 1978, Bradski 1994). This view has won empirical support, especially from electrophysiological recordings from behaving primates (Funahashi 1989, Fuster 1971, Fuster 1973, Miyashita 1988, Miyashita 1988a), coalescing into a 'standard model' of short term memory (Goldman-Rakic 1987, Goldman-Rakic 1990, Courtney 2004, Postle 2006).

    Attention-driven theories propose that short-term memory is simply the process of temporarily re-activating, or 'attending' to, memory representations stored in LTM (Cowan 1993, Cowan 2008) (more?). Such theories reject Baddeley's (Baddeley 1974) physical separation of short- from long-term processing, and instead propose a functional separation only, consistent with Atkinson & Shiffrin's (Atkinson 1968) original proposal, and, in fact, the Hebbian idea of 'transient memory' (Hebb 1949). This approach avoids the undesirable requirement of duplicating representations of all stored knowledge in dedicated buffer networks organised by information type and stimulus modality, as implied in Baddeley's compartmental memory model (Baddeley 2010). Multiple LTM memories can instead be reactivated at the same time within their existing LTM media, requiring no duplicate networks. Only a subset of such reactivated memories are considered in the 'focus' of attention as generally understood, allowing for gradations in the level of focus and active maintenance (Cowan 2008): In the case of word list recall, the representation of the current word would be reactivated most strongly and be 'in focus', with less recent words fading to baseline.

    In this way (Cowan 2008) leaves open the possibility for both decay of active maintenance and interference between stored representations as mechanisms of forgetting, which is a further point of conflict between buffer and attentional theories (Jonides 2008). In buffer theories recall fails when buffer activity decays. Attentional theories, by contrast, traditionally operate on memory encoding schemes in which memories which are most different, either in content, context, or time-of-perception, suffer least mutual interference from other encoded memories, and hence have greater recall (Hopfield 1982, Nairne 1997, Brown 2002, Brown 2007, Howard 2002). The Temporal Context Model (TCM) of (Howard 2002, Sederberg 2008) binds memories to an evolving 'current context' representation, which, by its nature, also includes the content of the memory itself. TCM is one of the most successful short term memory models, but in constructing their context representation from superpositions of stimuli it becomes, in implementation, approximately an activation-based buffer mechanism with decay (Usher 2008).

    Given, firstly, the natural representational overlap between context and its constituent parts, as evident in TCM; secondly, evidence suggesting that both buffer- and context-like mechanisms may be needed to explain observed recall phenomena (Miller 2010, Cousins 2014); and the efficiency advantage in constructing short-term memory as a process operating on LTM representations (Raaijmakers 1981a, Davelaar 2013, Postle 2006) rather than require a potentially combinatoric duplication of LTM; we propose a novel short-term memory network with the following properties: The LAMM memory network is an activation based buffer, which represents stimuli agnostically to the stimulus type or modality, and in which items being remembered are also associated with one another, encoding contextual information about such multi-item episodes.

    Linking Via Active Maintenance

    • outline LAMM here

    The Linking via Active Maintenance model is premised upon the two known mechanisms of neuronal memory: active maintenance and synaptic plasticity, both of which were also present in SAM (Raaijmakers 1981). In this way it aims to combine aspects of both the active buffer and contextual association models that have proven successful. Moreover, these processes are not independent, but interact according to the standard Hebbian paradigm (Hebb 1949, Amit 1995) wherein the repeated firing of maintained activity leads to strengthening of the connections which support it, thus 'wiring in' this activity pattern for later reactivation.

    LAMM is structured into two network layers: a long term memory (LTM) store in which all the presented items are already encoded, and the short-term network through which episodic information about the lists presented is maintained and encoded. The item layer, as we shall call the LTM network, forms both the input and output of the model. Its structure is a winner-take-all attractor network, consisting of rate-model units approximating recurrently connected neuronal subpopulations, only one of which may be active at a time. Each such state represents the perception of a single word item, implying the definite end state of linguistic discrimination processes which recognise spoken language (CITE?). We assume that this end state activity constitutes conscious perception of the word, and thus the reactivation of such states in the absence of auditory stimuli constitutes recollection. In this way the LTM item layer is both input and output for short term memory layer, which we will refer to simply as the memory network.

    The memory network also contains self-recurrent units but its dynamics admits several such units to be active simultaneously, allowing multiple 'winners'. Memory units share significant lateral excitation, and can be bound together into larger subgroups via associative plasticity. Activity in the memory network is excited by feedforward connections from the item layer, which provides feedback to items as well (see Table ?). These item-memory, memory-item and memory-memory connections undergo bi-directional, short-term potentiation, and largely encode the episodic associations of the presented lists. Their plasticity dynamics are based upon recently demonstrated, associative short term plasticity (ASTP, Erickson 2010), which quickly potentiates synapses of co-active neurons, and then decays over several minutes.

    LAMM's memory layer also aims to meet three criteria: Firstly, it should be limited in capacity, and hence must be modality- and type-agnostic, unlike the potentially combinatoric components of a fully compartmental Baddley-style memory (Courtney 2004, Postle 2006); secondly, it should have an intermediate dimensionality, small enough such that memory activity is discrete between representations of different list items, but large enough that these overlap and may also represent the current context via such superposition of recent items (Usher 2008); finally, the activity evoked in the memory layer should be variable such that it depends on recent history, and hence that repeated stimuli may evoke different patterns of activity each time it is presented to the network. The second property is a statement of distributed coding, that memory activity representing different items may overlap; the third property requires that these patterns of activity not be determined solely by the input connectivity. We shall see that the parameters that were found to generate recall most reliably took the memory dynamics away from the third property, with the result that memory activity is largely determined by item-memory projections, and thus that memory representations are largely fixed (Fig. \ref{memcorrs}). **I WOULD SHIFT THIS TO RESULTS NOT INTRO

    Comparison to other models - Most similar models

    LAMM's mechanisms are most similar to Raaijmaker and Shiffrin's Search of Associative Memory (SAM) (Raaijmakers 1981), and in some ways can be considered a biological implementation of those ideas. LAMM shares much with (Davelaar 2005) in general principles, but their and our memory networks are differently designed: their memory activity is localist and evolves stochastically via a random walk, independently of items presented; in LAMM memory activity is distributed amongst memory units, and is driven by feedforward inputs from the item layer. LAMM's memory network could instead be seen as a crude neuronal implementation of TCM's context signal, in that memory activity both reflects and is driven by presented items, yet we do not have the perfect fidelity of representation that TCM's infinite dimensional context vector allows. In these ways LAMM's contextual representation is more discrete and variable than TCM's but less so than Davelaar et al.'s.

    • The LTM network compris