Limits...
A bio-inspired feature extraction for robust speech recognition.

Zouhir Y, Ouni K - Springerplus (2014)

Bottom Line: The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB).The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC).The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

View Article: PubMed Central - PubMed

Affiliation: Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia.

ABSTRACT
In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

No MeSH data available.


The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz). The bottom panel illustrates the simulation of BMM for the waveform segment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230714&req=5

Fig4: The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz). The bottom panel illustrates the simulation of BMM for the waveform segment.

Mentions: The basilar membrane motion produced by a 34-channel Gammachirp auditory filterbank in response to a speech waveform segment is presented in Figure 4 (Bleeck et al. 2004). The waveform is the 25 ms of the word “Water” which is extracted from TIMIT database (Garofolo et al. 1990). The centre frequencies of the Gammachirp filters are equally spaced between 50 Hz and 8 kHz on the ERB-rate scale. Each individual line shows the output of one channel in the used auditory filterbank. The surface defined by the lines represents the simulation of basilar membrane motion (BMM). As illustrated in Figure 4, the concentrations of activity in channels above 191 Hz correspond to the resonance frequencies in the human vocal tract (Bleeck et al. 2004).Figure 4


A bio-inspired feature extraction for robust speech recognition.

Zouhir Y, Ouni K - Springerplus (2014)

The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz). The bottom panel illustrates the simulation of BMM for the waveform segment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230714&req=5

Fig4: The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz). The bottom panel illustrates the simulation of BMM for the waveform segment.
Mentions: The basilar membrane motion produced by a 34-channel Gammachirp auditory filterbank in response to a speech waveform segment is presented in Figure 4 (Bleeck et al. 2004). The waveform is the 25 ms of the word “Water” which is extracted from TIMIT database (Garofolo et al. 1990). The centre frequencies of the Gammachirp filters are equally spaced between 50 Hz and 8 kHz on the ERB-rate scale. Each individual line shows the output of one channel in the used auditory filterbank. The surface defined by the lines represents the simulation of basilar membrane motion (BMM). As illustrated in Figure 4, the concentrations of activity in channels above 191 Hz correspond to the resonance frequencies in the human vocal tract (Bleeck et al. 2004).Figure 4

Bottom Line: The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB).The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC).The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

View Article: PubMed Central - PubMed

Affiliation: Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia.

ABSTRACT
In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

No MeSH data available.