Limits...
Using hierarchical time series clustering algorithm and wavelet classifier for biometric voice classification.

Fong S - J. Biomed. Biotechnol. (2012)

Bottom Line: Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth.In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree.The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer and Information Science, University of Macau, Taipa, Macau. ccfong@umac.mo

ABSTRACT
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm.

Show MeSH
Visualization of time series plots that represent the voiceprints by three different speakers who uttered the same Japanese vowels.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3351073&req=5

fig5: Visualization of time series plots that represent the voiceprints by three different speakers who uttered the same Japanese vowels.

Mentions: Just as shown in our model in Figure 3, the raw voice series are formatted and processed into records that have exactly 12 coefficients (attributes). Hierarchical time series clustering is applied over the processed data, so that each data point that the clustering algorithm works with has identical attributes and scales for similarities measures. By plotting the processed data with x-axis as the first column of a consecutive block against the rest of the series with values within the range at the y-axis, we generate some visualization of the time series points with distinguishable shapes. Figure 5 shows three groups of voice series that are taken from the dataset blocks from three different speakers. Just by visual inspection, we could observe their differences in appearance. The four voice utterances on the top row sit at about three quarters on the x-axis, the cap of the data clusters is dominated by small square dots (that just represent one of the coefficient values of the block of the sample), then followed by other shapes of dots and diamond shaped dots at the bottom. Though each of the four clusters on the top row is not exactly identical to each other, they roughly have a similar structure. In the middle row, the voice visualization by another speaker has the data near the middle of x-axis, and the outlined structure has the cross-shaped markers on the cap. And the visualization on the bottom row has an obviously different formation than the other two. That shows the voices of the three speakers are essentially different as by their voice characteristics, and the differences can be visually spotted. However, computationally, the differences in voice characteristics would have to be revealed by clustering algorithm.


Using hierarchical time series clustering algorithm and wavelet classifier for biometric voice classification.

Fong S - J. Biomed. Biotechnol. (2012)

Visualization of time series plots that represent the voiceprints by three different speakers who uttered the same Japanese vowels.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3351073&req=5

fig5: Visualization of time series plots that represent the voiceprints by three different speakers who uttered the same Japanese vowels.
Mentions: Just as shown in our model in Figure 3, the raw voice series are formatted and processed into records that have exactly 12 coefficients (attributes). Hierarchical time series clustering is applied over the processed data, so that each data point that the clustering algorithm works with has identical attributes and scales for similarities measures. By plotting the processed data with x-axis as the first column of a consecutive block against the rest of the series with values within the range at the y-axis, we generate some visualization of the time series points with distinguishable shapes. Figure 5 shows three groups of voice series that are taken from the dataset blocks from three different speakers. Just by visual inspection, we could observe their differences in appearance. The four voice utterances on the top row sit at about three quarters on the x-axis, the cap of the data clusters is dominated by small square dots (that just represent one of the coefficient values of the block of the sample), then followed by other shapes of dots and diamond shaped dots at the bottom. Though each of the four clusters on the top row is not exactly identical to each other, they roughly have a similar structure. In the middle row, the voice visualization by another speaker has the data near the middle of x-axis, and the outlined structure has the cross-shaped markers on the cap. And the visualization on the bottom row has an obviously different formation than the other two. That shows the voices of the three speakers are essentially different as by their voice characteristics, and the differences can be visually spotted. However, computationally, the differences in voice characteristics would have to be revealed by clustering algorithm.

Bottom Line: Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth.In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree.The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer and Information Science, University of Macau, Taipa, Macau. ccfong@umac.mo

ABSTRACT
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm.

Show MeSH