Limits...
Active sensing in the categorization of visual patterns.

Yang SC, Lengyel M, Wolpert DM - Elife (2016)

Bottom Line: Interpreting visual scenes typically requires us to accumulate information from multiple locations in a scene.Using a novel gaze-contingent paradigm in a visual categorization task, we show that participants' scan paths follow an active sensing strategy that incorporates information already acquired about the scene and knowledge of the statistical structure of patterns.Our results suggest that participants select eye movements with the goal of maximizing information about abstract categories that require the integration of information from multiple locations.

View Article: PubMed Central - PubMed

Affiliation: Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, United Kingdom.

ABSTRACT
Interpreting visual scenes typically requires us to accumulate information from multiple locations in a scene. Using a novel gaze-contingent paradigm in a visual categorization task, we show that participants' scan paths follow an active sensing strategy that incorporates information already acquired about the scene and knowledge of the statistical structure of patterns. Intriguingly, categorization performance was markedly improved when locations were revealed to participants by an optimal Bayesian active sensor algorithm. By using a combination of a Bayesian ideal observer and the active sensor algorithm, we estimate that a major portion of this apparent suboptimality of fixation locations arises from prior biases, perceptual noise and inaccuracies in eye movements, and the central process of selecting fixation locations is around 70% efficient in our task. Our results suggest that participants select eye movements with the goal of maximizing information about abstract categories that require the integration of information from multiple locations.

No MeSH data available.


Trade-off between the two components making up total uncertainty underlying the maximum-entropy algorithm.Top row shows entropy (in bits) of the predictive distributions, , for a grid of locations after one observation. Bottom row shows the corresponding variance decomposition (cf. Equation 1), unexplained (blue) corresponds to “noise” entropy, , explained (red) correspond to the BAS score, Score, and total (black) coresponds to total uncertainty, . The widths of the gray regions correspond to the three length scales with which the three image types are constructed.DOI:http://dx.doi.org/10.7554/eLife.12215.013
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4764587&req=5

fig4s1: Trade-off between the two components making up total uncertainty underlying the maximum-entropy algorithm.Top row shows entropy (in bits) of the predictive distributions, , for a grid of locations after one observation. Bottom row shows the corresponding variance decomposition (cf. Equation 1), unexplained (blue) corresponds to “noise” entropy, , explained (red) correspond to the BAS score, Score, and total (black) coresponds to total uncertainty, . The widths of the gray regions correspond to the three length scales with which the three image types are constructed.DOI:http://dx.doi.org/10.7554/eLife.12215.013

Mentions: (A) The operation of BAS in a representative trial for saccades 1–8 and 14 (underlying image shown top left). For each fixation (left, panels), BAS computes a score across the image (gray scale, Equation 1). This indicates the expected informativeness of each putative fixation location based on its current belief about the image type, expressed as a posterior distribution (inset, lower left), which in turn is updated at each fixation by incorporating the new observation of the pixel value at that fixated location. Crosses show the fixation locations with maximal score for each saccade, green dots show past fixation locations chosen by the participant and yellow circle shows current fixation location. Percentage values (bottom right) show their information percentile values (the percentage of putative fixation locations with lower BAS scores than the one chosen by the participant). Histogram on the right shows distribution of percentile values across all participants, trials and fixations. (B) Predictions of the maximum entropy variant (the first term in Equation 1) as in (A). For saccades 1–3, the fixation locations with maximal score (crosses) are not shown because the maxima comprise a continuous region near the edge of the image instead of discrete points. Note that entropy can be maximal further (eg. fixation 4) or nearer the edges of the image (eg. fixation 1), depending on the tradeoff between the two additive components defining it: the BAS score, which tends to be higher near revealing locations (panel A), and uncertainty due to the stochasticity of the stimulus and perception noise, which tends to be greater away from revealing locations. Figure 4—figure supplement 1 shows two illustrative examples for this trade-off.


Active sensing in the categorization of visual patterns.

Yang SC, Lengyel M, Wolpert DM - Elife (2016)

Trade-off between the two components making up total uncertainty underlying the maximum-entropy algorithm.Top row shows entropy (in bits) of the predictive distributions, , for a grid of locations after one observation. Bottom row shows the corresponding variance decomposition (cf. Equation 1), unexplained (blue) corresponds to “noise” entropy, , explained (red) correspond to the BAS score, Score, and total (black) coresponds to total uncertainty, . The widths of the gray regions correspond to the three length scales with which the three image types are constructed.DOI:http://dx.doi.org/10.7554/eLife.12215.013
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4764587&req=5

fig4s1: Trade-off between the two components making up total uncertainty underlying the maximum-entropy algorithm.Top row shows entropy (in bits) of the predictive distributions, , for a grid of locations after one observation. Bottom row shows the corresponding variance decomposition (cf. Equation 1), unexplained (blue) corresponds to “noise” entropy, , explained (red) correspond to the BAS score, Score, and total (black) coresponds to total uncertainty, . The widths of the gray regions correspond to the three length scales with which the three image types are constructed.DOI:http://dx.doi.org/10.7554/eLife.12215.013
Mentions: (A) The operation of BAS in a representative trial for saccades 1–8 and 14 (underlying image shown top left). For each fixation (left, panels), BAS computes a score across the image (gray scale, Equation 1). This indicates the expected informativeness of each putative fixation location based on its current belief about the image type, expressed as a posterior distribution (inset, lower left), which in turn is updated at each fixation by incorporating the new observation of the pixel value at that fixated location. Crosses show the fixation locations with maximal score for each saccade, green dots show past fixation locations chosen by the participant and yellow circle shows current fixation location. Percentage values (bottom right) show their information percentile values (the percentage of putative fixation locations with lower BAS scores than the one chosen by the participant). Histogram on the right shows distribution of percentile values across all participants, trials and fixations. (B) Predictions of the maximum entropy variant (the first term in Equation 1) as in (A). For saccades 1–3, the fixation locations with maximal score (crosses) are not shown because the maxima comprise a continuous region near the edge of the image instead of discrete points. Note that entropy can be maximal further (eg. fixation 4) or nearer the edges of the image (eg. fixation 1), depending on the tradeoff between the two additive components defining it: the BAS score, which tends to be higher near revealing locations (panel A), and uncertainty due to the stochasticity of the stimulus and perception noise, which tends to be greater away from revealing locations. Figure 4—figure supplement 1 shows two illustrative examples for this trade-off.

Bottom Line: Interpreting visual scenes typically requires us to accumulate information from multiple locations in a scene.Using a novel gaze-contingent paradigm in a visual categorization task, we show that participants' scan paths follow an active sensing strategy that incorporates information already acquired about the scene and knowledge of the statistical structure of patterns.Our results suggest that participants select eye movements with the goal of maximizing information about abstract categories that require the integration of information from multiple locations.

View Article: PubMed Central - PubMed

Affiliation: Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, United Kingdom.

ABSTRACT
Interpreting visual scenes typically requires us to accumulate information from multiple locations in a scene. Using a novel gaze-contingent paradigm in a visual categorization task, we show that participants' scan paths follow an active sensing strategy that incorporates information already acquired about the scene and knowledge of the statistical structure of patterns. Intriguingly, categorization performance was markedly improved when locations were revealed to participants by an optimal Bayesian active sensor algorithm. By using a combination of a Bayesian ideal observer and the active sensor algorithm, we estimate that a major portion of this apparent suboptimality of fixation locations arises from prior biases, perceptual noise and inaccuracies in eye movements, and the central process of selecting fixation locations is around 70% efficient in our task. Our results suggest that participants select eye movements with the goal of maximizing information about abstract categories that require the integration of information from multiple locations.

No MeSH data available.