Limits...
Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations(1,2,3).

Carney LH, Li T, McDonough JM - eNeuro (2015)

Bottom Line: Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners.The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features.The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.

ABSTRACT
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.

No MeSH data available.


Related in: MedlinePlus

A–C, Population rate profiles for model AN (A), BP (B), and LPBR (C) cells in response to the vowel /æ/ (had) for a range of SNRs. Vowel levels were fixed at 65 dB SPL; the noise level increases toward the bottom of plots. A, Saturation of AN rates by the added noise obscures representations of formant frequencies, especially in the F2 region. B, Dips in the average discharge rate profile that indicate the first two formants in the BP population response deteriorate gradually as SNR decreases (toward the bottom of the plot). C, Peaks in the rate profile versus SNR for model LPBR cells also deteriorate as SNR decreases. Arrow and horizontal dashed lines indicate the approximate SRT for listeners with normal hearing (Festen and Plomp, 1990). Model parameters are the same as in Fig. 3B.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4596011&req=5

Figure 6: A–C, Population rate profiles for model AN (A), BP (B), and LPBR (C) cells in response to the vowel /æ/ (had) for a range of SNRs. Vowel levels were fixed at 65 dB SPL; the noise level increases toward the bottom of plots. A, Saturation of AN rates by the added noise obscures representations of formant frequencies, especially in the F2 region. B, Dips in the average discharge rate profile that indicate the first two formants in the BP population response deteriorate gradually as SNR decreases (toward the bottom of the plot). C, Peaks in the rate profile versus SNR for model LPBR cells also deteriorate as SNR decreases. Arrow and horizontal dashed lines indicate the approximate SRT for listeners with normal hearing (Festen and Plomp, 1990). Model parameters are the same as in Fig. 3B.

Mentions: The representation of formants in the model midbrain average discharge rate profiles is also robust in the presence of additive speech-shaped Gaussian noise across a range of signal-to-noise ratios (SNRs; Fig. 6). Figure 6A shows model AN fibers in response to the vowel /æ/ (had); as SNR decreases, the representation of the formants in the AN discharge rates deteriorates, especially in the F2 frequency region. Formant representation is much more robust in the response profiles of midbrain neurons (Fig. 6B,C). The dips in the response profiles of the model BP cells (Fig. 6B) and in the peaks in the LPBR profile (Fig. 6C) deteriorate at approximately the speech reception threshold (SRT), where human listeners have difficulty understanding speech in noise (approximately −5 dB SNR; Festen and Plomp, 1990).


Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations(1,2,3).

Carney LH, Li T, McDonough JM - eNeuro (2015)

A–C, Population rate profiles for model AN (A), BP (B), and LPBR (C) cells in response to the vowel /æ/ (had) for a range of SNRs. Vowel levels were fixed at 65 dB SPL; the noise level increases toward the bottom of plots. A, Saturation of AN rates by the added noise obscures representations of formant frequencies, especially in the F2 region. B, Dips in the average discharge rate profile that indicate the first two formants in the BP population response deteriorate gradually as SNR decreases (toward the bottom of the plot). C, Peaks in the rate profile versus SNR for model LPBR cells also deteriorate as SNR decreases. Arrow and horizontal dashed lines indicate the approximate SRT for listeners with normal hearing (Festen and Plomp, 1990). Model parameters are the same as in Fig. 3B.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4596011&req=5

Figure 6: A–C, Population rate profiles for model AN (A), BP (B), and LPBR (C) cells in response to the vowel /æ/ (had) for a range of SNRs. Vowel levels were fixed at 65 dB SPL; the noise level increases toward the bottom of plots. A, Saturation of AN rates by the added noise obscures representations of formant frequencies, especially in the F2 region. B, Dips in the average discharge rate profile that indicate the first two formants in the BP population response deteriorate gradually as SNR decreases (toward the bottom of the plot). C, Peaks in the rate profile versus SNR for model LPBR cells also deteriorate as SNR decreases. Arrow and horizontal dashed lines indicate the approximate SRT for listeners with normal hearing (Festen and Plomp, 1990). Model parameters are the same as in Fig. 3B.
Mentions: The representation of formants in the model midbrain average discharge rate profiles is also robust in the presence of additive speech-shaped Gaussian noise across a range of signal-to-noise ratios (SNRs; Fig. 6). Figure 6A shows model AN fibers in response to the vowel /æ/ (had); as SNR decreases, the representation of the formants in the AN discharge rates deteriorates, especially in the F2 frequency region. Formant representation is much more robust in the response profiles of midbrain neurons (Fig. 6B,C). The dips in the response profiles of the model BP cells (Fig. 6B) and in the peaks in the LPBR profile (Fig. 6C) deteriorate at approximately the speech reception threshold (SRT), where human listeners have difficulty understanding speech in noise (approximately −5 dB SNR; Festen and Plomp, 1990).

Bottom Line: Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners.The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features.The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.

ABSTRACT
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.

No MeSH data available.


Related in: MedlinePlus