Limits...
Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations(1,2,3).

Carney LH, Li T, McDonough JM - eNeuro (2015)

Bottom Line: Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners.The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features.The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.

ABSTRACT
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.

No MeSH data available.


Related in: MedlinePlus

Model IC responses to vowel tokens. A, B, The spectra of actual vowels are as follows: /æ/ (A), /i/ (B). C, D, Rate profile of model cells with BP MTFs tuned to F0 (blue) has dips at formants (dots). Rate profile of LPBR model cells with minima in the MTF near F0 (red) has peaks near formants (dots). AN (magenta) and CN/brainstem (cyan) rate profiles. D, Rate profile for BP cells has an ambiguous dip (arrow) for vowels with a broad spectral valley. LPBR cells (red) have relatively low rates where spectral energy is low, due to reduced excitatory inputs. Vowel levels were 65 dB SPL. Model parameters are the same as in Fig. 3B.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4596011&req=5

Figure 4: Model IC responses to vowel tokens. A, B, The spectra of actual vowels are as follows: /æ/ (A), /i/ (B). C, D, Rate profile of model cells with BP MTFs tuned to F0 (blue) has dips at formants (dots). Rate profile of LPBR model cells with minima in the MTF near F0 (red) has peaks near formants (dots). AN (magenta) and CN/brainstem (cyan) rate profiles. D, Rate profile for BP cells has an ambiguous dip (arrow) for vowels with a broad spectral valley. LPBR cells (red) have relatively low rates where spectral energy is low, due to reduced excitatory inputs. Vowel levels were 65 dB SPL. Model parameters are the same as in Fig. 3B.

Mentions: Responses of model IC cells with bandpass and band-reject MTFs to two vowel tokens (Fig. 4) illustrate the representation of formants in the average rate profiles of model IC population responses. As schematized in Fig. 1, the profile of average discharge rates for a population of model IC neurons with BP MTFs (Fig. 4C,D, blue) has minima at BFs near the vowel formants. In contrast, the rates of band-reject neurons (Fig. 4C,D, red) have peaks at the formants. The importance of the LPBR model for a robust neural code of vowel formants is illustrated in Figure 4D for the vowel /i/, which, like many vowels, has widely spaced formants. This response shows that reductions in the discharge rate of BP responses (Fig. 4D, blue) are ambiguous, as they may be due either to reduced fluctuations of AN responses tuned near formants (Fig. 1B) or to reduced spectral energy (Fig. 4D, arrow, 1500 Hz). This ambiguity is resolved by the LPBR model (Fig. 4D, red), which only responds when both sufficient energy and reduced fluctuations are present on the inputs to the model midbrain cell. The reduced fluctuations result in the disinhibition of the LPBR model by reducing the inhibitory input from the BP neuron. Note that the model LPBR population rate profile is qualitatively similar to the AN (Fig. 4C,D, magenta) and CN/brainstem (Fig. 4C,D, cyan) profiles, except that the LPBR population responses (Fig. 4C,D, red curves) have greater contrast and steeper slopes in the discharge rates across the population in comparison with the peripheral responses. The LPBR model average rate profiles differ from peripheral rate profiles in being more robust for vowels in background noise and across a wide range of sound levels (see below).


Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations(1,2,3).

Carney LH, Li T, McDonough JM - eNeuro (2015)

Model IC responses to vowel tokens. A, B, The spectra of actual vowels are as follows: /æ/ (A), /i/ (B). C, D, Rate profile of model cells with BP MTFs tuned to F0 (blue) has dips at formants (dots). Rate profile of LPBR model cells with minima in the MTF near F0 (red) has peaks near formants (dots). AN (magenta) and CN/brainstem (cyan) rate profiles. D, Rate profile for BP cells has an ambiguous dip (arrow) for vowels with a broad spectral valley. LPBR cells (red) have relatively low rates where spectral energy is low, due to reduced excitatory inputs. Vowel levels were 65 dB SPL. Model parameters are the same as in Fig. 3B.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4596011&req=5

Figure 4: Model IC responses to vowel tokens. A, B, The spectra of actual vowels are as follows: /æ/ (A), /i/ (B). C, D, Rate profile of model cells with BP MTFs tuned to F0 (blue) has dips at formants (dots). Rate profile of LPBR model cells with minima in the MTF near F0 (red) has peaks near formants (dots). AN (magenta) and CN/brainstem (cyan) rate profiles. D, Rate profile for BP cells has an ambiguous dip (arrow) for vowels with a broad spectral valley. LPBR cells (red) have relatively low rates where spectral energy is low, due to reduced excitatory inputs. Vowel levels were 65 dB SPL. Model parameters are the same as in Fig. 3B.
Mentions: Responses of model IC cells with bandpass and band-reject MTFs to two vowel tokens (Fig. 4) illustrate the representation of formants in the average rate profiles of model IC population responses. As schematized in Fig. 1, the profile of average discharge rates for a population of model IC neurons with BP MTFs (Fig. 4C,D, blue) has minima at BFs near the vowel formants. In contrast, the rates of band-reject neurons (Fig. 4C,D, red) have peaks at the formants. The importance of the LPBR model for a robust neural code of vowel formants is illustrated in Figure 4D for the vowel /i/, which, like many vowels, has widely spaced formants. This response shows that reductions in the discharge rate of BP responses (Fig. 4D, blue) are ambiguous, as they may be due either to reduced fluctuations of AN responses tuned near formants (Fig. 1B) or to reduced spectral energy (Fig. 4D, arrow, 1500 Hz). This ambiguity is resolved by the LPBR model (Fig. 4D, red), which only responds when both sufficient energy and reduced fluctuations are present on the inputs to the model midbrain cell. The reduced fluctuations result in the disinhibition of the LPBR model by reducing the inhibitory input from the BP neuron. Note that the model LPBR population rate profile is qualitatively similar to the AN (Fig. 4C,D, magenta) and CN/brainstem (Fig. 4C,D, cyan) profiles, except that the LPBR population responses (Fig. 4C,D, red curves) have greater contrast and steeper slopes in the discharge rates across the population in comparison with the peripheral responses. The LPBR model average rate profiles differ from peripheral rate profiles in being more robust for vowels in background noise and across a wide range of sound levels (see below).

Bottom Line: Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners.The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features.The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.

ABSTRACT
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.

No MeSH data available.


Related in: MedlinePlus