Discussion
In these experiments we measured speed discrimination for a range of horizontally translating auditory and visual motions. Using the same set of speeds in each modality (20, 40, 50, 60, 70, 80, 100°/s) and employing the method of single stimuli in which the speed on a given trial is compared against the mean of the set of speeds (see38,39), we first compared vision and audition in unimodal experiments. The unimodal data revealed that the slopes of the psychometric functions for auditory and visual motion were the same, although their means differed significantly. The mean of the auditory motion psychometric function matched the mean of the stimulus set, while the mean of the visual function was significantly lower than the mean stimulus speed. We also examined motion priming, that is, the tendency for a current motion speed to be biased towards the speed of the preceding one, in both unimodal and crossmodal experiments. These analyses showed a reliable unimodal motion priming effect occurred and that an equivalent effect occurred crossmodally when the prime stimulus was in the other modality. Taken together, the matching psychometric slopes for auditory and visual motion, and the presence of motion priming, regardless of whether the prime stimulus was in the same or different modality, suggest a common process underlies auditory and visual speed discrimination.
A key finding supporting common processing of auditory and visual speed is the near-identical bandwidths of the cumulative Gaussian psychometric functions for auditory and visual motion shown in Figure 2b. The width of a psychometric function is indicative of the noise associated with the underlying perceptual process and thus with perceptual precision. A narrower bandwidth (i.e., steeper slope) indicates greater precision. The finding of equal precision for auditory and visual speed discrimination is somewhat surprising because motion perception in audition is typically found to be much poorer than in vision40,51,52. Weber fractions for visual speed discrimination in vision, for speeds similar to the lower part of the range used here, are on the order of 0.0553,54 whereas Weber fractions for equivalent auditory speeds (30°/s & 60°/s) are many times higher at 0.24 and 0.30, respectively55. The reason for this large difference is not clear but one key point differentiating the stimuli used in these studies may be relevant. In audition, the stimulus was a sound source that translated across space from one point to the another, like an auditory object. There is no such ‘object displacement’ in vision, as the motion stimuli were a continuous drift within an aperture. It is thus locally contained and does not actually go anywhere. For this reason, comparing these auditory and visual motion discrimination studies is not a like for like comparison. Our study matched vision and audition by using translation across space and the same paths in both modalities (see Fig. 1). Our conditions, therefore, are directly comparable and clearly reveal a strikingly similar psychometric slope for motion discrimination in each modalities.
The difference in the psychometric means between audition and vision was another interesting result. As noted in the previous paragraph, auditory motion is often regarded as inferior to visual motion based on data showing poorer speed discrimination and on the grounds that audition – unlike vision – is fundamentally not a spatial sense. And yet, our data show that it was auditory motion that was veridically perceived. The auditory mean in Figure 2b (blue column) was not significantly different from mean stimulus speed, while the visual mean by contrast was significantly below the stimulus mean, showing an overall underestimation of perceived speed. Again, the reason for this is not clear as visual speed experiments are not done with translating objects as we have used here and we have not been able to find any other studies using a ‘spatial displacement’ approach comparable to ours. One relevant reason for the result, however, might be the nature of the speed cues in each modality. In vision, motion processing begins in primary visual cortex where direction-selective cells with small receptive fields are found. These are pooled into MT cells with much larger receptive fields and a higher range of speed tunings4,56,57. It is possible that by using a relatively fast set of speeds (up to 100°/s) there was a degree of smearing at the local V1 level such that MT inherited impaired motion signals. By contrast, the auditory system uses different cues to detect horizontal motion based primarily on interaural time and level differences, supplemented with spectral cues and doppler cues40,58 and these are not tied to small spatial receptive fields in the way that vision is and may remain veridical at higher speeds.
The final point of interest comes from the motion priming results. As is well established in the visual domain46-48, a brief preceding motion causes the current motion stimulus to be biased towards the preceding one in speed or direction (i.e., motion priming) while a long preceding motion causes a bias away from the preceding one (i.e., motion aftereffect). As our stimuli were brief, we expected and obtained significant motion priming effects for visual stimuli (Fig.3a). We could find no studies of motion priming with auditory stimuli on which to base a prediction, but in the event we also obtained a very similar motion priming effect for auditory motion (Fig. 3b,c). This outcome was far from certain as auditory motion involves a completely different set of cues to visual motion but was indicative that there may be common motion processing for visual and auditory stimuli. The critical condition was the crossmodal case, where auditory and visual stimuli were interleaved. As shown in Figure 4a,b, we obtained clear crossmodal motion priming. That is, consistent with a common motion process, current auditory motion speed was primed by the preceding visual speed, and vice versa , and both effects showed very similar functions when plotted for each level of preceding speed (Fig. 4c).
A further interesting feature is shown in Figure 4d, which shows the means of the crossmodal psychometric functions together with the unimodal means from Figure 2. Recall that the mean speeds for unimodal vison and audition were significantly different, with visual stimuli being perceived as slower than auditory (Fig. 2b). On priming trials where vision was preceded by audition, the psychometric mean speed for vision increased relative to the unimodal mean. Conversely, on auditory trials preceded by vision, the psychometric mean speed for audition decreased relative to unimodal audition. In both cases, there was a kind of averaging across the modalities of the current and previous speeds. If each modality was processed independently, the AV trials should have the same mean speed as the unimodal V trials, as AV are actually vision trials (preceded by audition). Similarly, independence would predict VA trials to have the same mean speed as unimodal A trials. Instead, in both crossmodal conditions, mean speed reflected a mix of both component speeds, consistent with common processing.
Visual motion processing is very well understood, first appearing in primary visual cortex (V1) and very strongly present in subsequent areas MT and MST (hMT+). Robust evidence that area MT/hMT+ is critical to speed perception comes from several neurophysiological studies in nonhuman primates. In macacque area MT, neurons are selectively tuned to speed56,57 and lesions to MT impair speed discrimination4,5,59. In addition, when speed is misperceived, it can still be accounted for by MT responses3. Complementing this, work in human neuroimaging shows that speed discrimination preferentially activates hMT+60. By contrast, evidence for motion selectivity in primary auditory cortex is scant11,61 although several studies have found evidence for auditory motion selectivity in the planum temporale, an area downstream from A1 located posterior to it. PT is primarily involved with language and music, both of which involve motion over the frequency dimension, but several studies have shown it also exhibits robust responses to auditory movement over space25,62-66.
The nature of auditory motion processing is still debated. Selectivity for movement over space is not a key feature of the auditory system as the primary auditory representation is tonotopic rather than spatial. For a long time, the standard model has been the snapshot model, with auditory motion inferred from a series of two or more static samples or ‘snapshots’40,52,67. Consistent with this position, psychophysical data shows that distance and duration are the strongest cues determining auditory motion perception68. A recent model has added further nuance to the snapshot theory by adding a simple temporal integration period and finding it adds further explanatory power58. The continued viability of the snapshot model raises the possibility that evidence appearing to indicate motion selectivity in neurophysiological and neuroimaging studies64,66,69 could instead arise from samples of positional information along a motion trajectory. The notion of computing motion from a series of well-spaced static positions is very well known in vision where it is known as long-range apparent motion70 and such motion stimuli are very effective at activating the visual system’s specialised motion area, hMT+71-73.
A number of studies have tested for a shared cortical representation of auditory and visual motion, in both PT and hMT+. Alink, et al.62 found reliable PT activation for auditory motion and also found an occipital area from which auditory direction could be decoded – although it was located ventrolaterally from the typical hMT+ location. More recently, Rezk, et al.25 found auditory and visual motion were both represented in right hMT+ and further showed that, in right hMT+, responses from motion in one modality could successfully decode motion from the other. This study establishes a direction-selective representation for auditory and visual motion in the same area. Previous studies have also reported that tactile motion also actives hMT+19,23, suggesting that it might be effectively a supramodal motion area. In the case of auditory motion, there is evidence from functional and diffusional MRI in human subjects indicating white matter connections between PT and hMT+74. This connectivity is a plausible pathway allowing auditory motion signals to activate hMT+.