Limits...
Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations(1,2,3).

Carney LH, Li T, McDonough JM - eNeuro (2015)

Bottom Line: Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners.The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features.The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.

ABSTRACT
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.

No MeSH data available.


Related in: MedlinePlus

Schematic illustration of vowel-coding hypothesis. The left-hand column labels the key stages in the coding scheme. A, Vowel spectrum consisting of harmonics of F0, shaped by the spectral envelope. B, Responses of AN fibers tuned near formants have relatively small pitch-related rate fluctuations. These responses are dominated by a single harmonic in the stimulus, referred to as synchrony capture. C, Fibers tuned between formants have strong rate fluctuations at F0 (Delgutte and Kiang, 1984). D, Example of a bandpass MTF from rabbit IC with a BMF near F0 for a typical male human speaker. E, Example band-reject MTF with a notch near a typical F0. F, Bandpass midbrain neurons have reduced rates in frequency channels with weak fluctuations (green arrow) and increased rates in channels with strong fluctuations (see C, orange arrow); thus dips in the rate profile of bandpass neurons encode F1 and F2. G, The profile of rates across a population of band-reject neurons has peaks at F1 and F2, because band-reject neurons respond more strongly to stimuli that result in reduced neural fluctuations in their inputs (see B, green arrow).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4596011&req=5

Figure 1: Schematic illustration of vowel-coding hypothesis. The left-hand column labels the key stages in the coding scheme. A, Vowel spectrum consisting of harmonics of F0, shaped by the spectral envelope. B, Responses of AN fibers tuned near formants have relatively small pitch-related rate fluctuations. These responses are dominated by a single harmonic in the stimulus, referred to as synchrony capture. C, Fibers tuned between formants have strong rate fluctuations at F0 (Delgutte and Kiang, 1984). D, Example of a bandpass MTF from rabbit IC with a BMF near F0 for a typical male human speaker. E, Example band-reject MTF with a notch near a typical F0. F, Bandpass midbrain neurons have reduced rates in frequency channels with weak fluctuations (green arrow) and increased rates in channels with strong fluctuations (see C, orange arrow); thus dips in the rate profile of bandpass neurons encode F1 and F2. G, The profile of rates across a population of band-reject neurons has peaks at F1 and F2, because band-reject neurons respond more strongly to stimuli that result in reduced neural fluctuations in their inputs (see B, green arrow).

Mentions: Many inferior colliculus (IC) neurons display both spectral tuning, described by a most sensitive best frequency (BF), and tuning to the frequency of sinusoidal fluctuations in amplitude, described by a best modulation frequency (BMF; Krishna and Semple, 2000; Joris et al., 2004; Nelson and Carney, 2007). Most IC neurons tuned for amplitude fluctuations have BMFs in the range of voice pitch (Langner, 1992) and are thus well suited to represent the critical acoustic features of vowels (Delgutte et al., 1998). The vowel-coding hypothesis presented here takes advantage of nonlinear properties of AN responses, including rate saturation (Sachs and Abbas, 1974; Yates, 1990; Yates et al., 1990) and synchrony capture, which is the dominance of a single stimulus frequency component on the response (Fig. 1; Young and Sachs, 1979; Deng and Geisler, 1987; Miller et al., 1997). These nonlinearities have strong effects on the rate fluctuations of AN fibers in response to vowels and provide a robust framework for encoding vowel features.


Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations(1,2,3).

Carney LH, Li T, McDonough JM - eNeuro (2015)

Schematic illustration of vowel-coding hypothesis. The left-hand column labels the key stages in the coding scheme. A, Vowel spectrum consisting of harmonics of F0, shaped by the spectral envelope. B, Responses of AN fibers tuned near formants have relatively small pitch-related rate fluctuations. These responses are dominated by a single harmonic in the stimulus, referred to as synchrony capture. C, Fibers tuned between formants have strong rate fluctuations at F0 (Delgutte and Kiang, 1984). D, Example of a bandpass MTF from rabbit IC with a BMF near F0 for a typical male human speaker. E, Example band-reject MTF with a notch near a typical F0. F, Bandpass midbrain neurons have reduced rates in frequency channels with weak fluctuations (green arrow) and increased rates in channels with strong fluctuations (see C, orange arrow); thus dips in the rate profile of bandpass neurons encode F1 and F2. G, The profile of rates across a population of band-reject neurons has peaks at F1 and F2, because band-reject neurons respond more strongly to stimuli that result in reduced neural fluctuations in their inputs (see B, green arrow).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4596011&req=5

Figure 1: Schematic illustration of vowel-coding hypothesis. The left-hand column labels the key stages in the coding scheme. A, Vowel spectrum consisting of harmonics of F0, shaped by the spectral envelope. B, Responses of AN fibers tuned near formants have relatively small pitch-related rate fluctuations. These responses are dominated by a single harmonic in the stimulus, referred to as synchrony capture. C, Fibers tuned between formants have strong rate fluctuations at F0 (Delgutte and Kiang, 1984). D, Example of a bandpass MTF from rabbit IC with a BMF near F0 for a typical male human speaker. E, Example band-reject MTF with a notch near a typical F0. F, Bandpass midbrain neurons have reduced rates in frequency channels with weak fluctuations (green arrow) and increased rates in channels with strong fluctuations (see C, orange arrow); thus dips in the rate profile of bandpass neurons encode F1 and F2. G, The profile of rates across a population of band-reject neurons has peaks at F1 and F2, because band-reject neurons respond more strongly to stimuli that result in reduced neural fluctuations in their inputs (see B, green arrow).
Mentions: Many inferior colliculus (IC) neurons display both spectral tuning, described by a most sensitive best frequency (BF), and tuning to the frequency of sinusoidal fluctuations in amplitude, described by a best modulation frequency (BMF; Krishna and Semple, 2000; Joris et al., 2004; Nelson and Carney, 2007). Most IC neurons tuned for amplitude fluctuations have BMFs in the range of voice pitch (Langner, 1992) and are thus well suited to represent the critical acoustic features of vowels (Delgutte et al., 1998). The vowel-coding hypothesis presented here takes advantage of nonlinear properties of AN responses, including rate saturation (Sachs and Abbas, 1974; Yates, 1990; Yates et al., 1990) and synchrony capture, which is the dominance of a single stimulus frequency component on the response (Fig. 1; Young and Sachs, 1979; Deng and Geisler, 1987; Miller et al., 1997). These nonlinearities have strong effects on the rate fluctuations of AN fibers in response to vowels and provide a robust framework for encoding vowel features.

Bottom Line: Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners.The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features.The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Biomedical Engineering, and Neurobiology & Anatomy, University of Rochester , Rochester, New York 14642.

ABSTRACT
Current models for neural coding of vowels are typically based on linear descriptions of the auditory periphery, and fail at high sound levels and in background noise. These models rely on either auditory nerve discharge rates or phase locking to temporal fine structure. However, both discharge rates and phase locking saturate at moderate to high sound levels, and phase locking is degraded in the CNS at middle to high frequencies. The fact that speech intelligibility is robust over a wide range of sound levels is problematic for codes that deteriorate as the sound level increases. Additionally, a successful neural code must function for speech in background noise at levels that are tolerated by listeners. The model presented here resolves these problems, and incorporates several key response properties of the nonlinear auditory periphery, including saturation, synchrony capture, and phase locking to both fine structure and envelope temporal features. The model also includes the properties of the auditory midbrain, where discharge rates are tuned to amplitude fluctuation rates. The nonlinear peripheral response features create contrasts in the amplitudes of low-frequency neural rate fluctuations across the population. These patterns of fluctuations result in a response profile in the midbrain that encodes vowel formants over a wide range of levels and in background noise. The hypothesized code is supported by electrophysiological recordings from the inferior colliculus of awake rabbits. This model provides information for understanding the structure of cross-linguistic vowel spaces, and suggests strategies for automatic formant detection and speech enhancement for listeners with hearing loss.

No MeSH data available.


Related in: MedlinePlus