Keywords: self-generation, multisensory associations, active
learning, gaze-controlled interface
Introduction
We learn faster when we are actively engaged with the material – this
is not just a folk wisdom, but it has been reproduced in a plethora of
experimental settings (Markant et al., 2016). Active learning benefits
for memory are, however, a very diverse phenomenon, and in order to
study their cognitive and neural underpinnings, we have to choose a
focus. In this study, we have asked ourselves whether memory gain
through active learning could be linked to the previously established
differences in the neural and perceptual processing of self- versus
externally generated stimuli (Baess et al., 2011; Blakemore et al.,
1998, 2000; Schäfer & Marcus, 1973). Specifically, we decided to study
the effects of being in control over auditory stimuli on the learning
progress of visuo-auditory contingencies, in an experimental setup that
controls for the conflating factors of predictability and movement. We
designed a variation of the classic self-generation paradigm (Schäfer &
Marcus, 1973) using eye movement sonification. In a memory task,
participants learned associations between movement-sound pairs. Their
learning progress was tracked across several stages of learning on a
behavioural and neural level.
Our first aim was to test whether actively controlling stimuli, beyond
effects of movement and predictability, would lead to associative
learning benefits on a behavioural level. Previous studies have shown
that active control during complex tasks such as spatial navigation, as
well as simpler experimental setups such as recognition memory tasks,
can facilitate learning (Harman et al., 1999; James et al., 2002;
Plancher et al., 2013). A related, somewhat more clearly defined
phenomenon is the “production effect”: Stimuli produced by oneself are
remembered better than externally produced stimuli (Brown & Palmer,
2012; MacLeod et al., 2010). Even minimal amounts of control, such as
controlling the pacing of information, have been found to improve memory
(Markant et al., 2014). The effects of control are easily conflated with
the effects of movement during learning, since in most studies on this
question, participants use hand movements in order to control stimuli in
the active condition, while not moving at all in the passive condition
(Craddock et al., 2011; Harman et al., 1999; Liu et al., 2007; Luursema
& Verwey, 2011; Meijer & Van der Lubbe, 2011). Nevertheless, some
studies have found memory benefits for active learning even when
controlling for the factor of movement (Plancher et al., 2013; Trewartha
et al., 2015). Theoretical and experimental accounts of the role of
choice during learning suggest that controlling the flow – the pacing,
the order – of information is crucial for the memory gain, since the
learner is able to develop hypotheses and test them, or revisit items
that they feel unsure about (Gureckis & Markant, 2012; Kruschke, 2008;
Markant et al., 2016; Markant & Gureckis, 2010; Schulze et al., 2012).
This is corroborated by the fact that motor activity unrelated to
strategic control over the learning strategy does not improve memory
performance (Voss et al., 2011). In order to get a better view of the
role of control in the learning process of arbitrary motor-auditory
contingencies, we developed a learning paradigm in which participants
had to return several times to the same set of stimuli and were tested
on their memory performance in between rounds of learning. We
hypothesised that we would encounter a memory advantage for stimuli
learned under active exploration. We expect that this memory advantage
will express itself in participants learning the associations faster in
the active condition.
A second aim of this study was to investigate the neural mechanisms
underlying the putative memory benefits for active learning. To that
aim, we isolated neurophysiological effects of control over acoustic
stimuli from unspecific neuromodulatory effects caused by movements and
effects of stimulus predictability. Control during stimulus generation
could modulate brain responses at different levels of learning, and we
probed a series of possible mechanisms.
Control over stimuli, accompanied by a Sense of Agency (SoA), is known
to impact stimulus processing. Electrophysiological responses to
self-generated stimuli tend to be attenuated relative to externally
generated stimuli, even when the stimuli evoking the response are
physically identical (Blakemore et al., 2000; Gentsch & Schütz-Bosbach,
2011; Hughes et al., 2013b; Hughes & Waszak, 2011; Kilteni et al.,
2020; Mifsud et al., 2018; SanMiguel et al., 2013). Although they can be
observed in all sensory modalities (auditory: Baess et al., 2011;
visual: Hughes & Waszak, 2011; tactile: Kilteni et al., 2020, for some
examples), attenuation effects on sensory processing have been
extensively studied in the auditory domain, often comparing evoked
electrophysiological responses to self-generated and
externally-generated acoustic stimuli (Horváth, 2015; Schäfer & Marcus,
1973). Using electroencephalogram (EEG), a number of neuro-electrical
markers of self-generated processing have been established: An
attenuation of certain event-related potentials (ERPs), i.e. a
diminished amplitude of different peaks that characterise the early
cortical processing of self- as opposed to externally generated sounds.
Attenuation for self-generated sounds has been observed in the N1
component (Bäß et al., 2008; Elijah et al., 2018; Mifsud et al., 2016;
Neszmélyi & Horváth, 2017; Oestreich et al., 2016; Pinheiro et al.,
2019; van Elk et al., 2014), the P2 component (Horváth & Burgyán, 2013;
Knolle et al., 2012), and the Tb component (Paraskevoudi & SanMiguel,
2022; SanMiguel et al., 2013; Saupe et al., 2013). The nature of these
effects is often assumed to be predictive, since efference copies of
motor commands are thought to serve as a basis for precise anticipation
of sensory stimulation (Miall & Wolpert, 1996). Correctly predicted
sensory stimulation is thought to elicit smaller neural responses than
wrongly predicted or surprising input, in line with the predictive
coding theory of neural processing (Blakemore et al., 1998; Kilner et
al., 2007). However, previous research has shown that motor activity
during sensory processing also has unspecific modulatory effects that
are not related to predictability – just being in motion affects the
way we perceive stimuli (Horváth et al., 2012), and movement effects can
be a conflating factor when trying to study the effects of
predictability and control (Hazemann et al., 1975; Horváth, 2013;
Paraskevoudi & SanMiguel, 2021; Press & Cook, 2015). Recent studies
that investigated specifically effects of agency, controlling for
predictability and movement, have found both attenuation and enhancement
effects on the P2 component (Bolt & Loehr, 2021; Han et al., 2021), and
modulations of the P3 component have also been observed (Burnside et
al., 2019; Kühn et al., 2011). We hypothesized that if the known effects
typically observed in the self-generation paradigms on the N1, P2 and P3
component are indeed related to agency and control, we should be able to
reproduce them with our design, even though we used an unconventional
experimental paradigm: Instead of hand or finger movements, participants
used their eye movements to generate sounds. By using a gaze-controlled
interface, we were able to compare an experimental condition in which
participants controlled a cursor using their eye movements
(“agent condition”) with a condition in which participantsfollowed a cursor with their gaze (“observer condition”),
minimizing the motor differences between conditions. Eye movements are
mostly automatic and usually used towards visual goals, and we have no
expectations of auditory consequences of our eye movements (Mifsud &
Whitford, 2017; Slobodenyuk, 2016). Importantly, two studies using
self-generation paradigms have used saccades to generate sounds and
found either no attenuation for eye-movement initiated sounds (Mifsud &
Whitford, 2017) or weakened attenuation of the N1, but not the P2
component (Mifsud et al., 2016). Electrophysiological responses to gaze
fixations have been measured in the context of brain-computer interfaces
and gaze-controlled games (Ihme & Zander, 2011; Protzak et al., 2013),
and certain markers of voluntary gaze control have been established:
Voluntary gaze fixations that were made consciously in order to control
an interface were characterised by a slow negative parieto-occipital
wave evoked by the fixation which was absent or much decreased in
fixations that did not control the interface (Protzak et al., 2013;
Shishkin et al., 2016).
Rather than control affecting stimulus processing on a basic level, we
also considered the possibility that control would specifically modulate
learning processes. Repeated presentation of a given movement-sound
pair, as was the case in our paradigm, leads to neural changes over time
related to the learning progress – we develop internal models of the
associations that we have learned (Kilner et al., 2007), and the sound’s
predictability based on the preceding movement increases gradually.
Effects of predictability on ERP components strongly resemble those of
self-generation: predictability often leads to sensory attenuation
(Alink et al., 2010; Grotheer & Kovács, 2016; Kaiser & Schütz-Bosbach,
2018; Summerfield et al., 2008), and in fact sensory attenuation for
self-generated stimuli is more pronounced when the outcome of the
self-generated action matches the agents’ expectation (Hughes et al.,
2013a; Stenner et al., 2014). Controlling for temporal predictability
can help us to understand the functional separation of modulations of
established ERP components by self-generation (Klaffehn et al., 2019).
By studying the evolution of ERP components in relation to learning we
can shed light on the effects of increased predictability beyond the
self-generation effects, which should be observable from the start of
the learning process. In line with previous studies, we expected to find
an increased attenuation of the N1 (Kaiser & Schütz-Bosbach, 2018) and
P2 component during late stages of learning. Furthermore, modulations of
the P3 component – with less clear directionality – have been observed
as a function of learning (Polich, 2007; Turk et al., 2018). If control
was to facilitate learning progress, we would expect stronger or earlier
effects of learning when participants have control over the stimuli.
Further insight into the neural mechanisms behind the active learning
memory advantage can be gained by studying evoked responses to
incongruent sounds. In our paradigm, participants are regularly tested
on their memory of movement-sound pairs; in those test trials, they are
required to passively observe a cursor movement and listen to a sound,
and judge whether the two are a matching pair or not, based on their
previously learned associations. We hypothesised that control during
acquisition strengthens the internal representation of the
movement-sound association, so violations of the latter should elicit
larger prediction error signals (Knolle et al., 2013; Mathias et al.,
2015). Based on the previous literature, we expected incongruent stimuli
to elicit mismatch responses like the N200 or an orienting response like
the P3a (Knolle et al., 2013; Winkler et al., 2009). Alternatively,
sounds congruent with learned associations can elicit “matching”
responses: The P3b component in particular is thought to reflect the
matching of a stimulus with a predicted item, and has been found to be
larger with increased predictability (Molinaro & Carreiras, 2010; Roehm
et al., 2007; Vespignani et al., 2010). This component is also referred
to as “late positive component” (LPC), which is believed to reflect an
explicit recollective process (Friedman & Johnson Jr., 2000), typically
elicited by designs in which participants have to make a response
related to the stimulus (Yang et al., 2019). It is considered part of
the classical “old/new” effect; stimuli presented in a test phase
which appear familiar to the participant elicit a stronger LPC (Woodruff
et al., 2006). The LPC has been found to be a predictor of learning
outcomes (Turk et al., 2018). We expected the strength of either the
matching-responses to correctly predicted or the mismatch responses to
incongruent sounds to be modulated by the factor of control during the
learning phase.
Once a motor-auditory association is established, the increased
predictability of sounds that comes with learning should affect sound
processing similarly regardless of whether sounds are presented during
learning or during a test trial: we expected that sensory responses –
specifically the N1 and P2 component – would be attenuated during late
stages of learning, and we hypothesised that this effect could be
modulated by the mode of acquisition of the motor-auditory associations.
Previous studies have shown that during memory tests, stimuli that were
previously self-generated can cause motor-reactivation even in the
absence of movememt (Butler et al., 2011). The distinctiveness account
of the production effect (Hommel, 2005) suggests that motor activation
during learning builds stronger, more distinctive memory traces, which
is thought to be reflected in more efficient learning; how we learned
something affects how we will process it in the future. Movement during
sound processing affects our memory of the sound, but is it necessary
for the movement to be causally linked to the sound in order for this
effect to come into play? If the latter was indeed necessary, we would
expect to see an effect of agency – rather than movement – on the
neural processing of the stimulus or the strength of the memory trace.
Alternatively, if we do not find modulations by agency, that would give
support to the idea that movement does not need to be causal to the
stimulus in order to affect its processing or memory encoding (Horváth
et al., 2012).
In the present study, our goal was to improve our understanding of how
active control over sound stimuli affects their immediate sensory
processing and encoding in memory. Towards these aims, we studied the
way in which control during learning improves memory, and how it
modulates neural responses during sound processing and memory encoding.
Last but not least, our goal was to reveal a link between
self-generation effects during sound processing and memory benefits of
active control.
Methods
Participants
Twenty-five healthy undergraduate university students from the
University of Barcelona volunteered in the study. Two participants were
excluded from the analysis due to their low behavioural performance,
based on a cut-off point determined by simulating the responses of 25
randomly responding individuals and choosing the highest performing one
as the threshold (56% correctness in the behavioural task). The final
sample included twenty-three participants (14 women, M = 21 years old,
range: 18–31). No participant self-reported any hearing impairment,
psychiatric disorder or use of nervous system-affecting substances at
least 48 hours prior to the experiment. All participants gave written
informed consent for their participation after the nature of the study
was explained to them and they were monetarily compensated (10
Medical Association (Declaration of Helsinki) with the exception of
pre-registration and was accepted by the Bioethics Committee of the
University of Barcelona.
Experiment design
Experiment
structure
The experiment consisted of two types of trials: acquisition trials and
test trials. During acquisition trials, participants had 20 s to learn
associations between movement directions of a white cursor over a grid
of 9 red squares (Fig. 1), and 8 different sounds that were played
depending on the cursor movements (see section on sound generation
below). During test trials, participants were tested on their memory for
the movement-sound associations.
The movement-sound associations were learned either as agents or as
observers. Agent and observer experimental conditions differed only
during acquisition trials. During acquisition trials in the agent
condition, the cursor was controlled by the participant’s gaze, while in
the observer condition the cursor was animated by the computer. Thus, in
the agent condition, the acquisition process required active
exploration. Participants were instructed to perform saccades over the
squares and generate as many different sounds as possible. In the
observer condition, the cursor was animated using previously recorded
eye movements from the same participant, and participants were asked to
follow the cursor’s movements and memorise the relationships between
movements and sounds.
Following each acquisition trial, participants were tested on their
memory of the movement-sound associations in a series of 6 test trials.
During test trials, participants were presented with a short animation
of the cursor moving from one square to another in a straight line
(executing one of the 8 possible movements). After a delay of 750 ms
(matching the pattern of acquisition trials, see section “Visual
stimulation and gaze-controlled sound generation”), one of the 8 sounds
familiar from acquisition, either congruent or incongruent with the
previously learned associations, was presented. 50% of test trials
presented congruent movement-sound pairs. The order of the animations
and sounds was based on a computer-generated, randomised list. At the
end of each test trial, participants responded whether the movement and
sound were a congruent pair by pressing one out of two buttons on a midi
keyboard placed in front of them.
One acquisition trial and 6 test trials were considered a “learning
block”. During 7 consecutive learning blocks, participants were
presented with the same movement-sound associations. Groups of 7
learning blocks with contingent movement-sound associations are referred
to here as “contingency blocks”. After the termination of the
7th learning block, the contingency block was finished
and new sounds were loaded, so participants had to start their learning
process anew. Contingency blocks alternated between the agent and
observer conditions. The order of the conditions was counterbalanced
across participants.