Introduction
Motion perception is usually studied in a single modality although it is
clear that the brain must have the capacity to process motion signals
from multiple modalities because real objects translating across space
often produce correlated motion in more than one sensory modality. A
common motion processing network for visual, auditory and tactile motion
is a compelling notion because it would be very efficient from an
evolutionary view, saving duplication of motion systems in each modality
which, in any case, would still need to be integrated to elicit the
benefits of multisensory perception1. Some authors
have argued that the human medial temporal region (hMT+) may be the
region that serves this function by responding to motion without regard
to the sensory modality of the input1,2. hMT+ is a
motion-specialised area that is located quite early in the visual
cortical pathway, one synapse from primary visual cortex (V1), and is
not generally considered to be multisensory. Activity in MT correlates
well with perceived visual motion3-5 and the movement
perceived in the visual motion aftereffect6-9.
Human neuroimaging investigations into supramodal motion processing in
MT have not produced clear-cut results. The approach has been to test
for supramodal motion processing by looking for overlapping responses in
hMT+ to auditory and visual motion, or to auditory and tactile motion.
Results from these studies are mixed, with fMRI studies testing for
auditory motion responses in hMT+ generally producing negative results
in sighted subjects10-12, and two that did find
auditory responses13,14 being queried on
methodological grounds15,16. Other studies have looked
for tactile motion responses in hMT+ and found evidence of direction
selectivity17-23, although again, several of these
have been queried methodologically15. More recently,
Dormal, et al.24 used a decoding approach and found
that auditory motion direction could be reliably decoded from hMT+
activity. Rezq, et al.25 confirmed this decoding
result in hMT+ and in an interesting symmetry also showed that response
patterns to auditory motion could predict visual motion direction, and
visual response patterns could predict auditory motion.
Many behavioural studies have examined audiovisual motion perception but
clear evidence of supramodal processing is lacking and there are no
reports of optimal integration to parallel those observed for spatial
tasks26,27. Studies testing for summation of motion
signals by comparing bimodal detection thresholds against unimodal
thresholds have generally found weak interactions consistent with
probability summation rather than an additive or superadditive
summation28,29. There are many reports showing that
sound can modulate visual motion perception30-32 but
these may be attributable to changes in response criterion rather than
sensitivity, and one study28 showed small sensitivity
benefits regardless of whether the auditory and visual directions
corresponded (i.e., an absence of direction selectivity). The study
which came closest to finding a linear combination of auditory and
visual motion signals33 used sound and motion signals
that translated horizontally around the observer (we use a similar
format in this study) rather than screen-based visual motion drifting
within a fixed aperture and a pair of flanking speakers, and they did
find a strong summation that only occurred with congruent motion
components. A common supramodal process should show bidirectionality
between the senses, and to date the only study to report this involved
tactile and visual stimuli34, finding that adaptation
to tactile motion exhibited aftereffects in vision, and vice
versa .
In the current experiments, we randomly presented various speeds in a
trial sequence that was either entirely visual or entirely auditory
(Experiment 1) or involved randomly interleaved visual and auditory
motions from trial to trial (Experiment 2). To preview the results,
Experiment 1 showed speed perception was more veridical in audition than
vision, and that there was no difference in the precision of speed
discrimination between vision and audition. We also established that
both modalities showed motion priming, an attractive effect whereby
current speed perception is biased towards the previous trial’s speed
(i.e., a positive serial dependence). Experiment 2 randomly interleaved
auditory and visual motions moving in leftward or rightward directions.
When consecutive trials were congruent in direction (both leftward or
both rightward) we found symmetrical cross-modal priming effects between
trials. That is, current perceived speed was equivalently primed by the
preceding speed, regardless of the modality combination. Pairs of
incongruent trials showed no priming. Overall, these results suggest a
common and directionally selective mechanism for auditory and visual
motion.