Limits...
Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

Roberts B, Summers RJ, Bailey PJ - J Exp Psychol Hum Percept Perform (2014)

Bottom Line: The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility.Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs.Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.

View Article: PubMed Central - PubMed

Affiliation: Psychology, School of Life and Health Sciences.

ABSTRACT
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.

Show MeSH

Related in: MedlinePlus

Stimuli for Experiment 1: Schematic illustrating the use of scale factors to control the depth of formant-frequency variation in three-formant synthetic speech. Using the example sentence “The cat ran along,” the frequency contours of F1, F2, and F3 are shown for the cases where the scale factor was 100% (i.e., not adjusted; dashed line), 50% (solid line), and 0% (dotted line). Depth of formant-frequency variation was controlled by applying a common scale factor to a set of values for each formant, representing its frequency contour in terms of deviations from the geometric mean frequency on a log scale (Equation 1). Hence, for the 0% case, the frequency of each formant was set to be constant at the geometric mean frequency for that formant track. The full set of scale factors used ranged from 100% to 0%, in steps of 10%. Note that only the formant-frequency contours were adjusted in this way; formant amplitude contours (not shown here) were always presented without adjustment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4120706&req=5

fig1: Stimuli for Experiment 1: Schematic illustrating the use of scale factors to control the depth of formant-frequency variation in three-formant synthetic speech. Using the example sentence “The cat ran along,” the frequency contours of F1, F2, and F3 are shown for the cases where the scale factor was 100% (i.e., not adjusted; dashed line), 50% (solid line), and 0% (dotted line). Depth of formant-frequency variation was controlled by applying a common scale factor to a set of values for each formant, representing its frequency contour in terms of deviations from the geometric mean frequency on a log scale (Equation 1). Hence, for the 0% case, the frequency of each formant was set to be constant at the geometric mean frequency for that formant track. The full set of scale factors used ranged from 100% to 0%, in steps of 10%. Note that only the formant-frequency contours were adjusted in this way; formant amplitude contours (not shown here) were always presented without adjustment.

Mentions: The stimuli comprised synthetic three-formant analogues of 44 sentences, presented diotically and without competitors. There were 11 conditions in the main experiment, which differed only in the magnitude of the common scale factor applied to the frequency contours of all three formants. Eleven versions of each sentence were created by changing the scale factor from 100% to 0% (constant at the geometric mean) in 10% steps; the amplitude contours of the formants were not changed by this manipulation. A schematic showing the effect of scaling the formant-frequency contours of the stimuli is shown in Figure 1. For each listener, the 44 sentences used were divided equally across the 11 conditions (i.e., four per condition), such that there were always 12 or 13 keywords per condition. Allocation of sentences was counterbalanced by rotation across each set of 11 listeners tested; hence, the experiment required a multiple of 11 listeners to produce a balanced dataset. There were 40 sentences used in the training session, for which all the formants were scaled to 100% depth.


Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

Roberts B, Summers RJ, Bailey PJ - J Exp Psychol Hum Percept Perform (2014)

Stimuli for Experiment 1: Schematic illustrating the use of scale factors to control the depth of formant-frequency variation in three-formant synthetic speech. Using the example sentence “The cat ran along,” the frequency contours of F1, F2, and F3 are shown for the cases where the scale factor was 100% (i.e., not adjusted; dashed line), 50% (solid line), and 0% (dotted line). Depth of formant-frequency variation was controlled by applying a common scale factor to a set of values for each formant, representing its frequency contour in terms of deviations from the geometric mean frequency on a log scale (Equation 1). Hence, for the 0% case, the frequency of each formant was set to be constant at the geometric mean frequency for that formant track. The full set of scale factors used ranged from 100% to 0%, in steps of 10%. Note that only the formant-frequency contours were adjusted in this way; formant amplitude contours (not shown here) were always presented without adjustment.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4120706&req=5

fig1: Stimuli for Experiment 1: Schematic illustrating the use of scale factors to control the depth of formant-frequency variation in three-formant synthetic speech. Using the example sentence “The cat ran along,” the frequency contours of F1, F2, and F3 are shown for the cases where the scale factor was 100% (i.e., not adjusted; dashed line), 50% (solid line), and 0% (dotted line). Depth of formant-frequency variation was controlled by applying a common scale factor to a set of values for each formant, representing its frequency contour in terms of deviations from the geometric mean frequency on a log scale (Equation 1). Hence, for the 0% case, the frequency of each formant was set to be constant at the geometric mean frequency for that formant track. The full set of scale factors used ranged from 100% to 0%, in steps of 10%. Note that only the formant-frequency contours were adjusted in this way; formant amplitude contours (not shown here) were always presented without adjustment.
Mentions: The stimuli comprised synthetic three-formant analogues of 44 sentences, presented diotically and without competitors. There were 11 conditions in the main experiment, which differed only in the magnitude of the common scale factor applied to the frequency contours of all three formants. Eleven versions of each sentence were created by changing the scale factor from 100% to 0% (constant at the geometric mean) in 10% steps; the amplitude contours of the formants were not changed by this manipulation. A schematic showing the effect of scaling the formant-frequency contours of the stimuli is shown in Figure 1. For each listener, the 44 sentences used were divided equally across the 11 conditions (i.e., four per condition), such that there were always 12 or 13 keywords per condition. Allocation of sentences was counterbalanced by rotation across each set of 11 listeners tested; hence, the experiment required a multiple of 11 listeners to produce a balanced dataset. There were 40 sentences used in the training session, for which all the formants were scaled to 100% depth.

Bottom Line: The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility.Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs.Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.

View Article: PubMed Central - PubMed

Affiliation: Psychology, School of Life and Health Sciences.

ABSTRACT
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.

Show MeSH
Related in: MedlinePlus