Limits...
Improving Speaker Recognition by Biometric Voice Deconstruction.

Mazaira-Fernandez LM, Álvarez-Marquina A, Gómez-Vilda P - Front Bioeng Biotechnol (2015)

Bottom Line: The present study benefits from the advances achieved during last years in understanding and modeling voice production.The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches.Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

View Article: PubMed Central - PubMed

Affiliation: Neuromorphic Voice Processing Laboratory, Center for Biomedical Technology, Universidad Politécnica de Madrid , Madrid , Spain.

ABSTRACT
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

No MeSH data available.


EER achieved for male and female speakers, when ZTNorm in the case of male (left) and NoNorm in the case of female (right) speakers are applied, in a gender-dependent setup which incorporates different combinations of extra parameters.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4585141&req=5

Figure 9: EER achieved for male and female speakers, when ZTNorm in the case of male (left) and NoNorm in the case of female (right) speakers are applied, in a gender-dependent setup which incorporates different combinations of extra parameters.

Mentions: We continue the test by introducing what we call alternative parameters into the best GDC, as is the one providing more accurate recognition results. We have tested all possible combinations of these parameters. Additionally, we also analyzed the effect of score normalization techniques, i.e., ZTNorm, ZNorm, TNorm, and NoNorm (which means that no score normalization technique is applied). Figure 9 provides the results achieved for male and female speakers, when ZTNorm in the case of male (left) and NoNorm in the case of female (right) speakers are used. The selected normalization techniques are justified taking into account the best results achieved in terms of EER. In the case of female speakers, the application of any kind of normalization will worsen the results as the amount of data available for normalization purposes is quite limited. Table 6 provides the most successful results achieved in terms of EER on the development set. In this scenario where the quality of the recordings regarding background noise are quite poor, the use of parameters F0 and F3 are more relevant for speaker characterization than E or ΔE.


Improving Speaker Recognition by Biometric Voice Deconstruction.

Mazaira-Fernandez LM, Álvarez-Marquina A, Gómez-Vilda P - Front Bioeng Biotechnol (2015)

EER achieved for male and female speakers, when ZTNorm in the case of male (left) and NoNorm in the case of female (right) speakers are applied, in a gender-dependent setup which incorporates different combinations of extra parameters.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4585141&req=5

Figure 9: EER achieved for male and female speakers, when ZTNorm in the case of male (left) and NoNorm in the case of female (right) speakers are applied, in a gender-dependent setup which incorporates different combinations of extra parameters.
Mentions: We continue the test by introducing what we call alternative parameters into the best GDC, as is the one providing more accurate recognition results. We have tested all possible combinations of these parameters. Additionally, we also analyzed the effect of score normalization techniques, i.e., ZTNorm, ZNorm, TNorm, and NoNorm (which means that no score normalization technique is applied). Figure 9 provides the results achieved for male and female speakers, when ZTNorm in the case of male (left) and NoNorm in the case of female (right) speakers are used. The selected normalization techniques are justified taking into account the best results achieved in terms of EER. In the case of female speakers, the application of any kind of normalization will worsen the results as the amount of data available for normalization purposes is quite limited. Table 6 provides the most successful results achieved in terms of EER on the development set. In this scenario where the quality of the recordings regarding background noise are quite poor, the use of parameters F0 and F3 are more relevant for speaker characterization than E or ΔE.

Bottom Line: The present study benefits from the advances achieved during last years in understanding and modeling voice production.The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches.Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

View Article: PubMed Central - PubMed

Affiliation: Neuromorphic Voice Processing Laboratory, Center for Biomedical Technology, Universidad Politécnica de Madrid , Madrid , Spain.

ABSTRACT
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

No MeSH data available.