Limits...
Cost-sensitive learning for emotion robust speaker recognition.

Li D, Yang Y, Dai W - ScientificWorldJournal (2014)

Bottom Line: This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively.Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper.The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

View Article: PubMed Central - PubMed

Affiliation: School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

ABSTRACT
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

Show MeSH
DET curves for the baseline, T-norm, ENORM, PFLSR, and CSSR based speaker verification system.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4066940&req=5

fig5: DET curves for the baseline, T-norm, ENORM, PFLSR, and CSSR based speaker verification system.

Mentions: We also apply the proposed method to speaker verification task and compare it with other score normalization methods, as shown in Figure 5. The performance measured by the detection error tradeoff function (DET) as well as equal error rate (EER). The EER is calculated as the operating point on the DET curve where the false-alarm and missed-detection rates are equal. Figure 5 shows the promise of the new approach. Evaluation results clearly show that CSSR technique outperforms the standard accumulated approach, T-norm, and ENORM methods for speaker verification on affective speech.


Cost-sensitive learning for emotion robust speaker recognition.

Li D, Yang Y, Dai W - ScientificWorldJournal (2014)

DET curves for the baseline, T-norm, ENORM, PFLSR, and CSSR based speaker verification system.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4066940&req=5

fig5: DET curves for the baseline, T-norm, ENORM, PFLSR, and CSSR based speaker verification system.
Mentions: We also apply the proposed method to speaker verification task and compare it with other score normalization methods, as shown in Figure 5. The performance measured by the detection error tradeoff function (DET) as well as equal error rate (EER). The EER is calculated as the operating point on the DET curve where the false-alarm and missed-detection rates are equal. Figure 5 shows the promise of the new approach. Evaluation results clearly show that CSSR technique outperforms the standard accumulated approach, T-norm, and ENORM methods for speaker verification on affective speech.

Bottom Line: This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively.Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper.The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

View Article: PubMed Central - PubMed

Affiliation: School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

ABSTRACT
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

Show MeSH