Limits...
Modeling auditory coding: from sound to spikes.

Rudnicki M, Schoppe O, Isik M, Völk F, Hemmert W - Cell Tissue Res. (2015)

Bottom Line: On the other hand, discrepancies between model results and measurements reveal gaps in our current knowledge, which can in turn be targeted by matched experiments.Models of the auditory periphery have improved greatly during the last decades, and account for many phenomena observed in experiments.It also provides uniform evaluation and visualization scripts, which allow for direct comparisons between models.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, Technische Universität München, München, Germany.

ABSTRACT
Models are valuable tools to assess how deeply we understand complex systems: only if we are able to replicate the output of a system based on the function of its subcomponents can we assume that we have probably grasped its principles of operation. On the other hand, discrepancies between model results and measurements reveal gaps in our current knowledge, which can in turn be targeted by matched experiments. Models of the auditory periphery have improved greatly during the last decades, and account for many phenomena observed in experiments. While the cochlea is only partly accessible in experiments, models can extrapolate its behavior without gap from base to apex and with arbitrary input signals. With models we can for example evaluate speech coding with large speech databases, which is not possible experimentally, and models have been tuned to replicate features of the human hearing organ, for which practically no invasive electrophysiological measurements are available. Auditory models have become instrumental in evaluating models of neuronal sound processing in the auditory brainstem and even at higher levels, where they are used to provide realistic input, and finally, models can be used to illustrate how such a complicated system as the inner ear works by visualizing its responses. The big advantage there is that intermediate steps in various domains (mechanical, electrical, and chemical) are available, such that a consistent picture of the evolvement of its output can be drawn. However, it must be kept in mind that no model is able to replicate all physiological characteristics (yet) and therefore it is critical to choose the most appropriate model-or models-for every research question. To facilitate this task, this paper not only reviews three recent auditory models, it also introduces a framework that allows researchers to easily switch between models. It also provides uniform evaluation and visualization scripts, which allow for direct comparisons between models.

No MeSH data available.


Comparison of ANF activity for an artificial vowel “ø” at 60 dB SPL (fundamental frequency: 200 Hz, speech formants F1: 450 Hz, F2: 1450 Hz, F3: 2450 Hz). Spike rates were averaged over the vowel duration (400 ms)
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4487355&req=5

Fig17: Comparison of ANF activity for an artificial vowel “ø” at 60 dB SPL (fundamental frequency: 200 Hz, speech formants F1: 450 Hz, F2: 1450 Hz, F3: 2450 Hz). Spike rates were averaged over the vowel duration (400 ms)

Mentions: One of the largest benefits of models is the analysis of auditory nerve responses to complex sounds, because this is very hard in physiological recordings, as it requires sampling nerve fibers along the whole CF range of the cochlea. Figure 17 shows averaged firing rates for an artificial vowel “ø”. Voiced speech sounds are generated by glottis vibrations, which generates a fundamental frequency (in our case: 200 Hz) and its higher harmonics (400 Hz, 600 Hz, 800 Hz, ...). This line spectrum is filtered by the vocal tract, which superimposes the characteristic formant structure. The sound was generated with a vocoder with a constant fundamental frequency, which makes it easy to assess the frequency resolution of the models directly from averaged spike counts. The fundamental frequency of the vowel and its harmonics were well resolved in the two models tweaked to human performance at least up to 1 kHz. As the traveling-wave model used in the Holmberg et al. (2007) model was restricted to 100 locations to limit the computational burden, its resolution appears coarse compared to the other models, for which responses at 200 CFs were plotted. The MAP model, due to its broader filters, resolved only the fundamental frequency, and the second and third harmonics 400 Hz at 600 Hz were scarcely separated. The coarse shape of all response functions was dominated in all cases by the speech formants, F1 at 450 Hz, F2 at 1450 Hz and F3 at 2450 Hz. In the low-frequency range (below 300 Hz), the filters of the Holmberg et al. (2007) model are still very narrow. This model would require structural changes to replicate the low-frequency region of the inner ear more accurately. The Zilany et al. (2014) model does not provide responses for CFs below 125 Hz due to the way it is implemented, that is why response could not be calculated down to 100 Hz in Fig. 17. For the Holmberg et al. (2007) model, MSR and HSR fibers show very similar response curves, while for the Zilany et al. (2014) model, the different fiber types seem to nicely code different dynamic ranges. For the MAP model, the HSR fibers seem to saturate early, despite its smaller overall sensitivity (compare Fig. 11).Fig. 17


Modeling auditory coding: from sound to spikes.

Rudnicki M, Schoppe O, Isik M, Völk F, Hemmert W - Cell Tissue Res. (2015)

Comparison of ANF activity for an artificial vowel “ø” at 60 dB SPL (fundamental frequency: 200 Hz, speech formants F1: 450 Hz, F2: 1450 Hz, F3: 2450 Hz). Spike rates were averaged over the vowel duration (400 ms)
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4487355&req=5

Fig17: Comparison of ANF activity for an artificial vowel “ø” at 60 dB SPL (fundamental frequency: 200 Hz, speech formants F1: 450 Hz, F2: 1450 Hz, F3: 2450 Hz). Spike rates were averaged over the vowel duration (400 ms)
Mentions: One of the largest benefits of models is the analysis of auditory nerve responses to complex sounds, because this is very hard in physiological recordings, as it requires sampling nerve fibers along the whole CF range of the cochlea. Figure 17 shows averaged firing rates for an artificial vowel “ø”. Voiced speech sounds are generated by glottis vibrations, which generates a fundamental frequency (in our case: 200 Hz) and its higher harmonics (400 Hz, 600 Hz, 800 Hz, ...). This line spectrum is filtered by the vocal tract, which superimposes the characteristic formant structure. The sound was generated with a vocoder with a constant fundamental frequency, which makes it easy to assess the frequency resolution of the models directly from averaged spike counts. The fundamental frequency of the vowel and its harmonics were well resolved in the two models tweaked to human performance at least up to 1 kHz. As the traveling-wave model used in the Holmberg et al. (2007) model was restricted to 100 locations to limit the computational burden, its resolution appears coarse compared to the other models, for which responses at 200 CFs were plotted. The MAP model, due to its broader filters, resolved only the fundamental frequency, and the second and third harmonics 400 Hz at 600 Hz were scarcely separated. The coarse shape of all response functions was dominated in all cases by the speech formants, F1 at 450 Hz, F2 at 1450 Hz and F3 at 2450 Hz. In the low-frequency range (below 300 Hz), the filters of the Holmberg et al. (2007) model are still very narrow. This model would require structural changes to replicate the low-frequency region of the inner ear more accurately. The Zilany et al. (2014) model does not provide responses for CFs below 125 Hz due to the way it is implemented, that is why response could not be calculated down to 100 Hz in Fig. 17. For the Holmberg et al. (2007) model, MSR and HSR fibers show very similar response curves, while for the Zilany et al. (2014) model, the different fiber types seem to nicely code different dynamic ranges. For the MAP model, the HSR fibers seem to saturate early, despite its smaller overall sensitivity (compare Fig. 11).Fig. 17

Bottom Line: On the other hand, discrepancies between model results and measurements reveal gaps in our current knowledge, which can in turn be targeted by matched experiments.Models of the auditory periphery have improved greatly during the last decades, and account for many phenomena observed in experiments.It also provides uniform evaluation and visualization scripts, which allow for direct comparisons between models.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, Technische Universität München, München, Germany.

ABSTRACT
Models are valuable tools to assess how deeply we understand complex systems: only if we are able to replicate the output of a system based on the function of its subcomponents can we assume that we have probably grasped its principles of operation. On the other hand, discrepancies between model results and measurements reveal gaps in our current knowledge, which can in turn be targeted by matched experiments. Models of the auditory periphery have improved greatly during the last decades, and account for many phenomena observed in experiments. While the cochlea is only partly accessible in experiments, models can extrapolate its behavior without gap from base to apex and with arbitrary input signals. With models we can for example evaluate speech coding with large speech databases, which is not possible experimentally, and models have been tuned to replicate features of the human hearing organ, for which practically no invasive electrophysiological measurements are available. Auditory models have become instrumental in evaluating models of neuronal sound processing in the auditory brainstem and even at higher levels, where they are used to provide realistic input, and finally, models can be used to illustrate how such a complicated system as the inner ear works by visualizing its responses. The big advantage there is that intermediate steps in various domains (mechanical, electrical, and chemical) are available, such that a consistent picture of the evolvement of its output can be drawn. However, it must be kept in mind that no model is able to replicate all physiological characteristics (yet) and therefore it is critical to choose the most appropriate model-or models-for every research question. To facilitate this task, this paper not only reviews three recent auditory models, it also introduces a framework that allows researchers to easily switch between models. It also provides uniform evaluation and visualization scripts, which allow for direct comparisons between models.

No MeSH data available.