Introduction
Motion perception is usually studied in a single modality although it is clear that the brain must have the capacity to process motion signals from multiple modalities because real objects translating across space often produce correlated motion in more than one sensory modality. A common motion processing network for visual, auditory and tactile motion is a compelling notion because it would be very efficient from an evolutionary view, saving duplication of motion systems in each modality which, in any case, would still need to be integrated to elicit the benefits of multisensory perception1. Some authors have argued that the human medial temporal region (hMT+) may be the region that serves this function by responding to motion without regard to the sensory modality of the input1,2. hMT+ is a motion-specialised area that is located quite early in the visual cortical pathway, one synapse from primary visual cortex (V1), and is not generally considered to be multisensory. Activity in MT correlates well with perceived visual motion3-5 and the movement perceived in the visual motion aftereffect6-9.
Human neuroimaging investigations into supramodal motion processing in MT have not produced clear-cut results. The approach has been to test for supramodal motion processing by looking for overlapping responses in hMT+ to auditory and visual motion, or to auditory and tactile motion. Results from these studies are mixed, with fMRI studies testing for auditory motion responses in hMT+ generally producing negative results in sighted subjects10-12, and two that did find auditory responses13,14 being queried on methodological grounds15,16. Other studies have looked for tactile motion responses in hMT+ and found evidence of direction selectivity17-23, although again, several of these have been queried methodologically15. More recently, Dormal, et al.24 used a decoding approach and found that auditory motion direction could be reliably decoded from hMT+ activity. Rezq, et al.25 confirmed this decoding result in hMT+ and in an interesting symmetry also showed that response patterns to auditory motion could predict visual motion direction, and visual response patterns could predict auditory motion.
Many behavioural studies have examined audiovisual motion perception but clear evidence of supramodal processing is lacking and there are no reports of optimal integration to parallel those observed for spatial tasks26,27. Studies testing for summation of motion signals by comparing bimodal detection thresholds against unimodal thresholds have generally found weak interactions consistent with probability summation rather than an additive or superadditive summation28,29. There are many reports showing that sound can modulate visual motion perception30-32 but these may be attributable to changes in response criterion rather than sensitivity, and one study28 showed small sensitivity benefits regardless of whether the auditory and visual directions corresponded (i.e., an absence of direction selectivity). The study which came closest to finding a linear combination of auditory and visual motion signals33 used sound and motion signals that translated horizontally around the observer (we use a similar format in this study) rather than screen-based visual motion drifting within a fixed aperture and a pair of flanking speakers, and they did find a strong summation that only occurred with congruent motion components. A common supramodal process should show bidirectionality between the senses, and to date the only study to report this involved tactile and visual stimuli34, finding that adaptation to tactile motion exhibited aftereffects in vision, and vice versa .
In the current experiments, we randomly presented various speeds in a trial sequence that was either entirely visual or entirely auditory (Experiment 1) or involved randomly interleaved visual and auditory motions from trial to trial (Experiment 2). To preview the results, Experiment 1 showed speed perception was more veridical in audition than vision, and that there was no difference in the precision of speed discrimination between vision and audition. We also established that both modalities showed motion priming, an attractive effect whereby current speed perception is biased towards the previous trial’s speed (i.e., a positive serial dependence). Experiment 2 randomly interleaved auditory and visual motions moving in leftward or rightward directions. When consecutive trials were congruent in direction (both leftward or both rightward) we found symmetrical cross-modal priming effects between trials. That is, current perceived speed was equivalently primed by the preceding speed, regardless of the modality combination. Pairs of incongruent trials showed no priming. Overall, these results suggest a common and directionally selective mechanism for auditory and visual motion.