Limits...
Visual speech discrimination and identification of natural and synthetic consonant stimuli.

Files BT, Tjan BS, Jiang J, Bernstein LE - Front Psychol (2015)

Bottom Line: Discrimination and identification were superior with natural stimuli, which comprised more phonetic information.Overall reductions in d' with inverted stimuli but a persistent pattern of larger d' for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes.The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.

View Article: PubMed Central - PubMed

Affiliation: U.S. Army Research Laboratory, Human Research and Engineering Directorate, Aberdeen Proving Ground MD, USA.

ABSTRACT
From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d') and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d') increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d' with inverted stimuli but a persistent pattern of larger d' for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.

No MeSH data available.


Experiment 2 mean d’ sensitivity for inverted and upright stimuli. The left panel shows group mean d’ averaged over all anchors, and the small panels show group mean d’ separated out by triplet anchor. Error bars are 95% within-subjects confidence intervals.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4499841&req=5

Figure 7: Experiment 2 mean d’ sensitivity for inverted and upright stimuli. The left panel shows group mean d’ averaged over all anchors, and the small panels show group mean d’ separated out by triplet anchor. Error bars are 95% within-subjects confidence intervals.

Mentions: Figure 7 summarizes the discrimination results and suggests that the pattern of discrimination across near and far stimulus pairs was invariant to orientation. A repeated measures ANOVA was carried out with within-subjects factors of stimulus distance (near, far), orientation (upright, inverted), and anchor syllable (/dɑ/, /dƷɑ/, /kɑ/, /nɑ/). Distance was a reliable main effect, F(1,11) = 399.5, η2 = 0.525, p < 0.001, with d’ for far (mean d’ = 3.67) greater than near pairs (mean d’ = 1.50). Anchor was a reliable main effect, F(2.13,23.43) = 11.21, ∼ 𝜀 = 0.710, η2 = 0.11, p < 0.001. However, orientation was also a reliable but very small main effect, F(1,11) = 4.85, η2 = 0.015, p = 0.05, with higher d’ for upright (mean d’ = 2.77) than inverted stimulus pairs (mean d’ = 2.41).


Visual speech discrimination and identification of natural and synthetic consonant stimuli.

Files BT, Tjan BS, Jiang J, Bernstein LE - Front Psychol (2015)

Experiment 2 mean d’ sensitivity for inverted and upright stimuli. The left panel shows group mean d’ averaged over all anchors, and the small panels show group mean d’ separated out by triplet anchor. Error bars are 95% within-subjects confidence intervals.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4499841&req=5

Figure 7: Experiment 2 mean d’ sensitivity for inverted and upright stimuli. The left panel shows group mean d’ averaged over all anchors, and the small panels show group mean d’ separated out by triplet anchor. Error bars are 95% within-subjects confidence intervals.
Mentions: Figure 7 summarizes the discrimination results and suggests that the pattern of discrimination across near and far stimulus pairs was invariant to orientation. A repeated measures ANOVA was carried out with within-subjects factors of stimulus distance (near, far), orientation (upright, inverted), and anchor syllable (/dɑ/, /dƷɑ/, /kɑ/, /nɑ/). Distance was a reliable main effect, F(1,11) = 399.5, η2 = 0.525, p < 0.001, with d’ for far (mean d’ = 3.67) greater than near pairs (mean d’ = 1.50). Anchor was a reliable main effect, F(2.13,23.43) = 11.21, ∼ 𝜀 = 0.710, η2 = 0.11, p < 0.001. However, orientation was also a reliable but very small main effect, F(1,11) = 4.85, η2 = 0.015, p = 0.05, with higher d’ for upright (mean d’ = 2.77) than inverted stimulus pairs (mean d’ = 2.41).

Bottom Line: Discrimination and identification were superior with natural stimuli, which comprised more phonetic information.Overall reductions in d' with inverted stimuli but a persistent pattern of larger d' for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes.The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.

View Article: PubMed Central - PubMed

Affiliation: U.S. Army Research Laboratory, Human Research and Engineering Directorate, Aberdeen Proving Ground MD, USA.

ABSTRACT
From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d') and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d') increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d' with inverted stimuli but a persistent pattern of larger d' for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.

No MeSH data available.