Limits...
Cue integration in categorical tasks: insights from audio-visual speech perception.

Bejjanki VR, Clayards M, Knill DC, Aslin RN - PLoS ONE (2011)

Bottom Line: Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer.Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance.The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.

View Article: PubMed Central - PubMed

Affiliation: Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, United States of America. vrao@bcs.rochester.edu

ABSTRACT
Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.

Show MeSH

Related in: MedlinePlus

A comparison of predicted and observed weights, during audio-visual phonemic labeling.The y-axis represents the weight assigned to the visual modality (or 1-weight assigned to the auditory modality). The x-axis represents the four blur levels-from no blur (Blur_0) to maximum blur (Blur_3). The blue bar, in each blur condition, represents the mean weight, across 6 participants (excluding the two outliers from Fig. 8), that should be assigned to the visual modality if participants' behavior is well-described by the provisional normative model. This figure differs from Fig. 7 in that it shows the predicted weights, from the provisional normative model, derived by eliminating the data from the two outlier participants and by using the correct estimate of visual sensory uncertainty for each participant (see text). The red bar, in each blur condition, represents the mean weight, across the 6 participants, that was actually assigned to the visual modality during the bimodal task. The error bars represent the 95% confidence intervals for the respective means.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3102664&req=5

pone-0019812-g009: A comparison of predicted and observed weights, during audio-visual phonemic labeling.The y-axis represents the weight assigned to the visual modality (or 1-weight assigned to the auditory modality). The x-axis represents the four blur levels-from no blur (Blur_0) to maximum blur (Blur_3). The blue bar, in each blur condition, represents the mean weight, across 6 participants (excluding the two outliers from Fig. 8), that should be assigned to the visual modality if participants' behavior is well-described by the provisional normative model. This figure differs from Fig. 7 in that it shows the predicted weights, from the provisional normative model, derived by eliminating the data from the two outlier participants and by using the correct estimate of visual sensory uncertainty for each participant (see text). The red bar, in each blur condition, represents the mean weight, across the 6 participants, that was actually assigned to the visual modality during the bimodal task. The error bars represent the 95% confidence intervals for the respective means.

Mentions: Figure 9 shows a comparison between the observed weights and the predicted weights, from the provisional normative model, derived by eliminating the data from the two outlier participants and by using the correct estimates of visual sensory uncertainty (i.e. the estimates of visual sensory uncertainty affecting multi-cue performance). Importantly, a 2-way repeated measures analysis of variance on the difference between the predicted weights, computed using the two prediction methods, and the observed weights confirmed that the predictive power of the provisional normative model was significantly improved by using the correct estimate of visual sensory uncertainty and by eliminating the outlier data [F(1,5) = 11.133, p = 0.021].


Cue integration in categorical tasks: insights from audio-visual speech perception.

Bejjanki VR, Clayards M, Knill DC, Aslin RN - PLoS ONE (2011)

A comparison of predicted and observed weights, during audio-visual phonemic labeling.The y-axis represents the weight assigned to the visual modality (or 1-weight assigned to the auditory modality). The x-axis represents the four blur levels-from no blur (Blur_0) to maximum blur (Blur_3). The blue bar, in each blur condition, represents the mean weight, across 6 participants (excluding the two outliers from Fig. 8), that should be assigned to the visual modality if participants' behavior is well-described by the provisional normative model. This figure differs from Fig. 7 in that it shows the predicted weights, from the provisional normative model, derived by eliminating the data from the two outlier participants and by using the correct estimate of visual sensory uncertainty for each participant (see text). The red bar, in each blur condition, represents the mean weight, across the 6 participants, that was actually assigned to the visual modality during the bimodal task. The error bars represent the 95% confidence intervals for the respective means.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3102664&req=5

pone-0019812-g009: A comparison of predicted and observed weights, during audio-visual phonemic labeling.The y-axis represents the weight assigned to the visual modality (or 1-weight assigned to the auditory modality). The x-axis represents the four blur levels-from no blur (Blur_0) to maximum blur (Blur_3). The blue bar, in each blur condition, represents the mean weight, across 6 participants (excluding the two outliers from Fig. 8), that should be assigned to the visual modality if participants' behavior is well-described by the provisional normative model. This figure differs from Fig. 7 in that it shows the predicted weights, from the provisional normative model, derived by eliminating the data from the two outlier participants and by using the correct estimate of visual sensory uncertainty for each participant (see text). The red bar, in each blur condition, represents the mean weight, across the 6 participants, that was actually assigned to the visual modality during the bimodal task. The error bars represent the 95% confidence intervals for the respective means.
Mentions: Figure 9 shows a comparison between the observed weights and the predicted weights, from the provisional normative model, derived by eliminating the data from the two outlier participants and by using the correct estimates of visual sensory uncertainty (i.e. the estimates of visual sensory uncertainty affecting multi-cue performance). Importantly, a 2-way repeated measures analysis of variance on the difference between the predicted weights, computed using the two prediction methods, and the observed weights confirmed that the predictive power of the provisional normative model was significantly improved by using the correct estimate of visual sensory uncertainty and by eliminating the outlier data [F(1,5) = 11.133, p = 0.021].

Bottom Line: Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer.Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance.The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.

View Article: PubMed Central - PubMed

Affiliation: Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, United States of America. vrao@bcs.rochester.edu

ABSTRACT
Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.

Show MeSH
Related in: MedlinePlus