Limits...
A bio-inspired feature extraction for robust speech recognition.

Zouhir Y, Ouni K - Springerplus (2014)

Bottom Line: The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB).The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC).The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

View Article: PubMed Central - PubMed

Affiliation: Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia.

ABSTRACT
In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

No MeSH data available.


The temporal representations and the spectrograms of the used noises.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230714&req=5

Fig6: The temporal representations and the spectrograms of the used noises.

Mentions: The TIMIT database (Garofolo et al. 1990) is used for all simulated speech recognition experiments. The used database is composed of speech signals sampled at 16 kHz of 630 speakers (female and male speakers) from 8 major dialect regions of the United States; each of them saying 10 sentences. We used isolated words extracted from this database. A total of 9702 isolated words were used in the training phase of the experiments and 3525 isolated words were used for the test phase. In order to evaluate the performance of our method on isolated words in the presence of various types of background noise, noisy corrupted tests sets were obtained by combining clean speech signals with suburban train, exhibition hall, street and car noises. These real-world noises were taken from AURORA database (Hirsch and Pearce 2000). Five noise levels, corresponding to 0 dB, 5 dB, 10 dB, 15 dB and 20 dB SNR values, where applied to each tests set. The temporal representations and the spectrograms of all used noises are shown in Figure 6.Figure 6


A bio-inspired feature extraction for robust speech recognition.

Zouhir Y, Ouni K - Springerplus (2014)

The temporal representations and the spectrograms of the used noises.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230714&req=5

Fig6: The temporal representations and the spectrograms of the used noises.
Mentions: The TIMIT database (Garofolo et al. 1990) is used for all simulated speech recognition experiments. The used database is composed of speech signals sampled at 16 kHz of 630 speakers (female and male speakers) from 8 major dialect regions of the United States; each of them saying 10 sentences. We used isolated words extracted from this database. A total of 9702 isolated words were used in the training phase of the experiments and 3525 isolated words were used for the test phase. In order to evaluate the performance of our method on isolated words in the presence of various types of background noise, noisy corrupted tests sets were obtained by combining clean speech signals with suburban train, exhibition hall, street and car noises. These real-world noises were taken from AURORA database (Hirsch and Pearce 2000). Five noise levels, corresponding to 0 dB, 5 dB, 10 dB, 15 dB and 20 dB SNR values, where applied to each tests set. The temporal representations and the spectrograms of all used noises are shown in Figure 6.Figure 6

Bottom Line: The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB).The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC).The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

View Article: PubMed Central - PubMed

Affiliation: Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia.

ABSTRACT
In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

No MeSH data available.