Discussion
In these experiments we measured speed discrimination for a range of
horizontally translating auditory and visual motions. Using the same set
of speeds in each modality (20, 40, 50, 60, 70, 80, 100°/s) and
employing the method of single stimuli in which the speed on a given
trial is compared against the mean of the set of speeds
(see38,39), we first compared vision and audition in
unimodal experiments. The unimodal data revealed that the slopes of the
psychometric functions for auditory and visual motion were the same,
although their means differed significantly. The mean of the auditory
motion psychometric function matched the mean of the stimulus set, while
the mean of the visual function was significantly lower than the mean
stimulus speed. We also examined motion priming, that is, the tendency
for a current motion speed to be biased towards the speed of the
preceding one, in both unimodal and crossmodal experiments. These
analyses showed a reliable unimodal motion priming effect occurred and
that an equivalent effect occurred crossmodally when the prime stimulus
was in the other modality. Taken together, the matching psychometric
slopes for auditory and visual motion, and the presence of motion
priming, regardless of whether the prime stimulus was in the same or
different modality, suggest a common process underlies auditory and
visual speed discrimination.
A key finding supporting common processing of auditory and visual speed
is the near-identical bandwidths of the cumulative Gaussian psychometric
functions for auditory and visual motion shown in Figure 2b. The width
of a psychometric function is indicative of the noise associated with
the underlying perceptual process and thus with perceptual precision. A
narrower bandwidth (i.e., steeper slope) indicates greater precision.
The finding of equal precision for auditory and visual speed
discrimination is somewhat surprising because motion perception in
audition is typically found to be much poorer than in
vision40,51,52. Weber fractions for visual speed
discrimination in vision, for speeds similar to the lower part of the
range used here, are on the order of 0.0553,54 whereas
Weber fractions for equivalent auditory speeds (30°/s & 60°/s) are many
times higher at 0.24 and 0.30, respectively55. The
reason for this large difference is not clear but one key point
differentiating the stimuli used in these studies may be relevant. In
audition, the stimulus was a sound source that translated across space
from one point to the another, like an auditory object. There is no such
‘object displacement’ in vision, as the motion stimuli were a continuous
drift within an aperture. It is thus locally contained and does not
actually go anywhere. For this reason, comparing these auditory and
visual motion discrimination studies is not a like for like comparison.
Our study matched vision and audition by using translation across space
and the same paths in both modalities (see Fig. 1). Our conditions,
therefore, are directly comparable and clearly reveal a strikingly
similar psychometric slope for motion discrimination in each modalities.
The difference in the psychometric means between audition and vision was
another interesting result. As noted in the previous paragraph, auditory
motion is often regarded as inferior to visual motion based on data
showing poorer speed discrimination and on the grounds that audition –
unlike vision – is fundamentally not a spatial sense. And yet, our data
show that it was auditory motion that was veridically perceived. The
auditory mean in Figure 2b (blue column) was not significantly different
from mean stimulus speed, while the visual mean by contrast was
significantly below the stimulus mean, showing an overall
underestimation of perceived speed. Again, the reason for this is not
clear as visual speed experiments are not done with translating objects
as we have used here and we have not been able to find any other studies
using a ‘spatial displacement’ approach comparable to ours. One relevant
reason for the result, however, might be the nature of the speed cues in
each modality. In vision, motion processing begins in primary visual
cortex where direction-selective cells with small receptive fields are
found. These are pooled into MT cells with much larger receptive fields
and a higher range of speed tunings4,56,57. It is
possible that by using a relatively fast set of speeds (up to 100°/s)
there was a degree of smearing at the local V1 level such that MT
inherited impaired motion signals. By contrast, the auditory system uses
different cues to detect horizontal motion based primarily on interaural
time and level differences, supplemented with spectral cues and doppler
cues40,58 and these are not tied to small spatial
receptive fields in the way that vision is and may remain veridical at
higher speeds.
The final point of interest comes from the motion priming results. As is
well established in the visual domain46-48, a brief
preceding motion causes the current motion stimulus to be biased towards
the preceding one in speed or direction (i.e., motion priming) while a
long preceding motion causes a bias away from the preceding one (i.e.,
motion aftereffect). As our stimuli were brief, we expected and obtained
significant motion priming effects for visual stimuli (Fig.3a). We could
find no studies of motion priming with auditory stimuli on which to base
a prediction, but in the event we also obtained a very similar motion
priming effect for auditory motion (Fig. 3b,c). This outcome was far
from certain as auditory motion involves a completely different set of
cues to visual motion but was indicative that there may be common motion
processing for visual and auditory stimuli. The critical condition was
the crossmodal case, where auditory and visual stimuli were interleaved.
As shown in Figure 4a,b, we obtained clear crossmodal motion priming.
That is, consistent with a common motion process, current auditory
motion speed was primed by the preceding visual speed, and vice
versa , and both effects showed very similar functions when plotted for
each level of preceding speed (Fig. 4c).
A further interesting feature is shown in Figure 4d, which shows the
means of the crossmodal psychometric functions together with the
unimodal means from Figure 2. Recall that the mean speeds for unimodal
vison and audition were significantly different, with visual stimuli
being perceived as slower than auditory (Fig. 2b). On priming trials
where vision was preceded by audition, the psychometric mean speed for
vision increased relative to the unimodal mean. Conversely, on auditory
trials preceded by vision, the psychometric mean speed for audition
decreased relative to unimodal audition. In both cases, there was a kind
of averaging across the modalities of the current and previous speeds.
If each modality was processed independently, the AV trials should have
the same mean speed as the unimodal V trials, as AV are actually vision
trials (preceded by audition). Similarly, independence would predict VA
trials to have the same mean speed as unimodal A trials. Instead, in
both crossmodal conditions, mean speed reflected a mix of both component
speeds, consistent with common processing.
Visual motion processing is very well understood, first appearing in
primary visual cortex (V1) and very strongly present in subsequent areas
MT and MST (hMT+). Robust evidence that area MT/hMT+ is critical to
speed perception comes from several neurophysiological studies in
nonhuman primates. In macacque area MT, neurons are selectively tuned to
speed56,57 and lesions to MT impair speed
discrimination4,5,59. In addition, when speed is
misperceived, it can still be accounted for by MT
responses3. Complementing this, work in human
neuroimaging shows that speed discrimination preferentially activates
hMT+60. By contrast, evidence for motion selectivity
in primary auditory cortex is scant11,61 although
several studies have found evidence for auditory motion selectivity in
the planum temporale, an area downstream from A1 located posterior to
it. PT is primarily involved with language and music, both of which
involve motion over the frequency dimension, but several studies have
shown it also exhibits robust responses to auditory movement over
space25,62-66.
The nature of auditory motion processing is still debated. Selectivity
for movement over space is not a key feature of the auditory system as
the primary auditory representation is tonotopic rather than spatial.
For a long time, the standard model has been the snapshot model, with
auditory motion inferred from a series of two or more static samples or
‘snapshots’40,52,67. Consistent with this position,
psychophysical data shows that distance and duration are the strongest
cues determining auditory motion perception68. A
recent model has added further nuance to the snapshot theory by adding a
simple temporal integration period and finding it adds further
explanatory power58. The continued viability of the
snapshot model raises the possibility that evidence appearing to
indicate motion selectivity in neurophysiological and neuroimaging
studies64,66,69 could instead arise from samples of
positional information along a motion trajectory. The notion of
computing motion from a series of well-spaced static positions is very
well known in vision where it is known as long-range apparent
motion70 and such motion stimuli are very effective at
activating the visual system’s specialised motion area,
hMT+71-73.
A number of studies have tested for a shared cortical representation of
auditory and visual motion, in both PT and hMT+. Alink, et
al.62 found reliable PT activation for auditory motion
and also found an occipital area from which auditory direction could be
decoded – although it was located ventrolaterally from the typical hMT+
location. More recently, Rezk, et al.25 found auditory
and visual motion were both represented in right hMT+ and further showed
that, in right hMT+, responses from motion in one modality could
successfully decode motion from the other. This study establishes a
direction-selective representation for auditory and visual motion in the
same area. Previous studies have also reported that tactile motion also
actives hMT+19,23, suggesting that it might be
effectively a supramodal motion area. In the case of auditory motion,
there is evidence from functional and diffusional MRI in human subjects
indicating white matter connections between PT and
hMT+74. This connectivity is a plausible pathway
allowing auditory motion signals to activate hMT+.