Limits...
Temporal predictive codes for spoken words in auditory cortex.

Gagnepain P, Henson RN, Davis MH - Curr. Biol. (2012)

Bottom Line: Computational simulations show that knowing "formubo" increases lexical competition when hearing "formu…", but reduces segment prediction error.The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account.This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.

View Article: PubMed Central - PubMed

Affiliation: MRC Cognition and Brain Sciences Unit, Cambridge, UK.

Show MeSH
Temporal Predictive Coding Model(A) Speech responsive cortex in the STG has been divided into multiple local patches that code different segments (phonemes here for convenience) illustrated by little Gaussian kernels. Prediction error units in the segment layer encodes the difference between predictions units activated by top-down lexical input and state units modulated by bottom-up activity from sensory acoustic analyses.(B) Illustration of the pattern of activity in the segment layer according to the three types of units (P, prediction units; S, state units, and PE, prediction error units) during recognition of the source word “formula,” novel word “formubo,” and baseline “formuty” after “formubo” has been added to the lexicon. The likelihood density function (the bottom row) represents the level of activity in each state unit coming from the acoustic analysis in lower levels. The prior density (the top row) corresponds to the level of activity in each prediction unit from a lexical system representing the likely identity of the next speech segment predicted from the current speech input using the CELEX database. The difference between the predicted pattern of activity and the pattern arising from sensory evidence determines the level of activity in each prediction error unit (the middle row). Simulation results averaged over all items can be found in Figure S3.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3405519&req=5

fig4: Temporal Predictive Coding Model(A) Speech responsive cortex in the STG has been divided into multiple local patches that code different segments (phonemes here for convenience) illustrated by little Gaussian kernels. Prediction error units in the segment layer encodes the difference between predictions units activated by top-down lexical input and state units modulated by bottom-up activity from sensory acoustic analyses.(B) Illustration of the pattern of activity in the segment layer according to the three types of units (P, prediction units; S, state units, and PE, prediction error units) during recognition of the source word “formula,” novel word “formubo,” and baseline “formuty” after “formubo” has been added to the lexicon. The likelihood density function (the bottom row) represents the level of activity in each state unit coming from the acoustic analysis in lower levels. The prior density (the top row) corresponds to the level of activity in each prediction unit from a lexical system representing the likely identity of the next speech segment predicted from the current speech input using the CELEX database. The difference between the predicted pattern of activity and the pattern arising from sensory evidence determines the level of activity in each prediction error unit (the middle row). Simulation results averaged over all items can be found in Figure S3.

Mentions: Results of the sensor analyses clearly suggest that changes to the neural response to spoken words and pseudowords reflect computations of segment prediction error rather than lexical entropy. Prediction error is assumed to encode the difference between activity in segment prediction units (derived from a distributed lexical-semantic system) and activity in state units (i.e., sensory evidence) derived from acoustic analysis in lower levels (e.g., primary auditory cortex; see Figure 4). Neural responses linked to this prediction error signal should therefore be localized to neural populations in the STG that have previously been argued to represent the segmental content of speech [18–20]. We therefore estimated the cortical sources of the MEG data during the 100–500 ms post-DP period, and searched for regions that matched the response profile across the six trained (day 1 and day 2) conditions that was predicted by our computational simulation (see Figure 1F; Figure S3). We found two clusters of 1,075 and 717 voxels whose spatial extent survived correction for multiple comparisons. These were spread across the left and right STG, supramarginal gyri, and rolandic operculum (Figure 3C). The largest differences in the response profile for prediction error in Figure 1F arises from lexicalized versus nonlexicalized items. We therefore defined a restricted search volume based on an orthogonal contrast of novel and baseline nonwords versus source words in the untrained condition (this lexicality effect showed good spatial correspondence to prior findings in a meta-analysis of relevant PET and functional magnetic resonance imaging studies [24]; see Figure S2). The peak statistic in both the left (x = −54, y = −12, z = +10, T(160) = 4.7) and right (x = 60, y = −20, z = +12, T(160) = 4.3) STG survived correction for multiple comparisons within this restricted volume (the source energies in left STG peak for each condition, pre- and post-DP, are shown for illustrative purposes in Figure 3D). Thus, source reconstruction further supports the view that MEG signals reflect prediction error at the level of segments, rather than competition at a higher lexical level.


Temporal predictive codes for spoken words in auditory cortex.

Gagnepain P, Henson RN, Davis MH - Curr. Biol. (2012)

Temporal Predictive Coding Model(A) Speech responsive cortex in the STG has been divided into multiple local patches that code different segments (phonemes here for convenience) illustrated by little Gaussian kernels. Prediction error units in the segment layer encodes the difference between predictions units activated by top-down lexical input and state units modulated by bottom-up activity from sensory acoustic analyses.(B) Illustration of the pattern of activity in the segment layer according to the three types of units (P, prediction units; S, state units, and PE, prediction error units) during recognition of the source word “formula,” novel word “formubo,” and baseline “formuty” after “formubo” has been added to the lexicon. The likelihood density function (the bottom row) represents the level of activity in each state unit coming from the acoustic analysis in lower levels. The prior density (the top row) corresponds to the level of activity in each prediction unit from a lexical system representing the likely identity of the next speech segment predicted from the current speech input using the CELEX database. The difference between the predicted pattern of activity and the pattern arising from sensory evidence determines the level of activity in each prediction error unit (the middle row). Simulation results averaged over all items can be found in Figure S3.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3405519&req=5

fig4: Temporal Predictive Coding Model(A) Speech responsive cortex in the STG has been divided into multiple local patches that code different segments (phonemes here for convenience) illustrated by little Gaussian kernels. Prediction error units in the segment layer encodes the difference between predictions units activated by top-down lexical input and state units modulated by bottom-up activity from sensory acoustic analyses.(B) Illustration of the pattern of activity in the segment layer according to the three types of units (P, prediction units; S, state units, and PE, prediction error units) during recognition of the source word “formula,” novel word “formubo,” and baseline “formuty” after “formubo” has been added to the lexicon. The likelihood density function (the bottom row) represents the level of activity in each state unit coming from the acoustic analysis in lower levels. The prior density (the top row) corresponds to the level of activity in each prediction unit from a lexical system representing the likely identity of the next speech segment predicted from the current speech input using the CELEX database. The difference between the predicted pattern of activity and the pattern arising from sensory evidence determines the level of activity in each prediction error unit (the middle row). Simulation results averaged over all items can be found in Figure S3.
Mentions: Results of the sensor analyses clearly suggest that changes to the neural response to spoken words and pseudowords reflect computations of segment prediction error rather than lexical entropy. Prediction error is assumed to encode the difference between activity in segment prediction units (derived from a distributed lexical-semantic system) and activity in state units (i.e., sensory evidence) derived from acoustic analysis in lower levels (e.g., primary auditory cortex; see Figure 4). Neural responses linked to this prediction error signal should therefore be localized to neural populations in the STG that have previously been argued to represent the segmental content of speech [18–20]. We therefore estimated the cortical sources of the MEG data during the 100–500 ms post-DP period, and searched for regions that matched the response profile across the six trained (day 1 and day 2) conditions that was predicted by our computational simulation (see Figure 1F; Figure S3). We found two clusters of 1,075 and 717 voxels whose spatial extent survived correction for multiple comparisons. These were spread across the left and right STG, supramarginal gyri, and rolandic operculum (Figure 3C). The largest differences in the response profile for prediction error in Figure 1F arises from lexicalized versus nonlexicalized items. We therefore defined a restricted search volume based on an orthogonal contrast of novel and baseline nonwords versus source words in the untrained condition (this lexicality effect showed good spatial correspondence to prior findings in a meta-analysis of relevant PET and functional magnetic resonance imaging studies [24]; see Figure S2). The peak statistic in both the left (x = −54, y = −12, z = +10, T(160) = 4.7) and right (x = 60, y = −20, z = +12, T(160) = 4.3) STG survived correction for multiple comparisons within this restricted volume (the source energies in left STG peak for each condition, pre- and post-DP, are shown for illustrative purposes in Figure 3D). Thus, source reconstruction further supports the view that MEG signals reflect prediction error at the level of segments, rather than competition at a higher lexical level.

Bottom Line: Computational simulations show that knowing "formubo" increases lexical competition when hearing "formu…", but reduces segment prediction error.The time course of magnetoencephalographic brain responses in the superior temporal gyrus (STG) is uniquely consistent with a segment prediction account.This prediction error signal explains the efficiency of human word recognition and simulates neural responses in auditory regions.

View Article: PubMed Central - PubMed

Affiliation: MRC Cognition and Brain Sciences Unit, Cambridge, UK.

Show MeSH