Limits...
Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus

View Article: PubMed Central - PubMed

ABSTRACT

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.

No MeSH data available.


Related in: MedlinePlus

BOLD effects during each experimental condition. Results are shown on an inflated surface rendering of the study-specific template in MNI space. Top: speech conditions (A, V, AV). Bottom: nonspeech conditions (R, G). All maps thresholded at an uncorrected voxel-wise p < 0.005 with a cluster threshold of 185 voxels (family-wise error rate (FWER) corrected p < 0.05). PSC, percent signal change.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5383672&req=5

Figure 3: BOLD effects during each experimental condition. Results are shown on an inflated surface rendering of the study-specific template in MNI space. Top: speech conditions (A, V, AV). Bottom: nonspeech conditions (R, G). All maps thresholded at an uncorrected voxel-wise p < 0.005 with a cluster threshold of 185 voxels (family-wise error rate (FWER) corrected p < 0.05). PSC, percent signal change.

Mentions: Activation maps for each of the five experimental conditions relative to rest are shown in Figure 3 (FWER < 0.05). Visual facial gestures (V, AV, G) activated bilateral primary and secondary visual cortices, lateral occipital-temporal visual regions, inferior and middle temporal gyri, and posterior STS. Conditions containing auditory information (A, AV, R) activated supratemporal auditory regions, the lateral superior temporal gyrus, and portions of the STS bilaterally. All conditions except for R activated bilateral inferior frontal regions. We tested directly for voxels showing an enhanced response to intelligible speech by computing the contrasts A > R and V > G. The A > R contrast (not displayed) did not yield any significant differences at the group level. Although this is not consistent with previous imaging work (Scott et al., 2000; Narain et al., 2003; Liebenthal et al., 2005; Okada et al., 2010), we believe that our use of sublexical stimuli may have contributed to this result. The V > G contrast yielded a visual speech network consistent with previous work (Campbell et al., 2001; Callan et al., 2004; Okada and Hickok, 2009; Bernstein et al., 2011; Hertrich et al., 2011), including bilateral STS, left inferior frontal gyrus, and a host of inferior parietal and frontal sensory-motor brain regions (Figure 4B).


Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus
BOLD effects during each experimental condition. Results are shown on an inflated surface rendering of the study-specific template in MNI space. Top: speech conditions (A, V, AV). Bottom: nonspeech conditions (R, G). All maps thresholded at an uncorrected voxel-wise p < 0.005 with a cluster threshold of 185 voxels (family-wise error rate (FWER) corrected p < 0.05). PSC, percent signal change.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5383672&req=5

Figure 3: BOLD effects during each experimental condition. Results are shown on an inflated surface rendering of the study-specific template in MNI space. Top: speech conditions (A, V, AV). Bottom: nonspeech conditions (R, G). All maps thresholded at an uncorrected voxel-wise p < 0.005 with a cluster threshold of 185 voxels (family-wise error rate (FWER) corrected p < 0.05). PSC, percent signal change.
Mentions: Activation maps for each of the five experimental conditions relative to rest are shown in Figure 3 (FWER < 0.05). Visual facial gestures (V, AV, G) activated bilateral primary and secondary visual cortices, lateral occipital-temporal visual regions, inferior and middle temporal gyri, and posterior STS. Conditions containing auditory information (A, AV, R) activated supratemporal auditory regions, the lateral superior temporal gyrus, and portions of the STS bilaterally. All conditions except for R activated bilateral inferior frontal regions. We tested directly for voxels showing an enhanced response to intelligible speech by computing the contrasts A > R and V > G. The A > R contrast (not displayed) did not yield any significant differences at the group level. Although this is not consistent with previous imaging work (Scott et al., 2000; Narain et al., 2003; Liebenthal et al., 2005; Okada et al., 2010), we believe that our use of sublexical stimuli may have contributed to this result. The V > G contrast yielded a visual speech network consistent with previous work (Campbell et al., 2001; Callan et al., 2004; Okada and Hickok, 2009; Bernstein et al., 2011; Hertrich et al., 2011), including bilateral STS, left inferior frontal gyrus, and a host of inferior parietal and frontal sensory-motor brain regions (Figure 4B).

View Article: PubMed Central - PubMed

ABSTRACT

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.

No MeSH data available.


Related in: MedlinePlus