Limits...
A conceptual framework of computations in mid-level vision.

Kubilius J, Wagemans J, Op de Beeck HP - Front Comput Neurosci (2014)

Bottom Line: Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features.Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands.Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Biological Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium ; Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium.

ABSTRACT
If a picture is worth a thousand words, as an English idiom goes, what should those words-or, rather, descriptors-capture? What format of image representation would be sufficiently rich if we were to reconstruct the essence of images from their descriptors? In this paper, we set out to develop a conceptual framework that would be: (i) biologically plausible in order to provide a better mechanistic understanding of our visual system; (ii) sufficiently robust to apply in practice on realistic images; and (iii) able to tap into underlying structure of our visual world. We bring forward three key ideas. First, we argue that surface-based representations are constructed based on feature inference from the input in the intermediate processing layers of the visual system. Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features. The constructed surfaces may be partially overlapping to compensate for occlusions and are ordered in depth (figure-ground organization). Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands. Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.

No MeSH data available.


Related in: MedlinePlus

Feature interpolation. (A) A second-order boundary stimulus as used by von der Heydt et al. (1984). (B) A stimulus with an illusory contour is perceived in the white gap between the two parts of the white rectangle, as used by von der Heydt et al. (1984). The arrow indicates that the white rectangle was moving. (C) A stimulus where a shape is defined entirely by second-order cues (that is, a difference in orientation), used in many figure-ground segmentation studies (e.g., Lamme, 1995).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4264474&req=5

Figure 1: Feature interpolation. (A) A second-order boundary stimulus as used by von der Heydt et al. (1984). (B) A stimulus with an illusory contour is perceived in the white gap between the two parts of the white rectangle, as used by von der Heydt et al. (1984). The arrow indicates that the white rectangle was moving. (C) A stimulus where a shape is defined entirely by second-order cues (that is, a difference in orientation), used in many figure-ground segmentation studies (e.g., Lamme, 1995).

Mentions: A number of studies have shown that mid-level vision is heavily involved in feature inference. Consider, for example, the seminal series of studies by von der Heydt et al. (1984), von der Heydt and Peterhans (1989), who compared neural responses to the typical luminance-defined stimuli and the neural responses to the same stimuli defined by cues other than luminance. In one of their conditions, a stimulus was composed of two regions containing line segments but with one region shifted with respect to the other, forming an offset-defined discontinuity in the texture, which we refer to as a second-order edge (Figure 1A). Importantly, a simple edge-detecting V1 model would not be able to find such edges, so if some neurons in the visual cortex were responding to such stimuli, it would mean that a higher-order computation is at work that somehow is capable of integrating information across the two regions in the image.


A conceptual framework of computations in mid-level vision.

Kubilius J, Wagemans J, Op de Beeck HP - Front Comput Neurosci (2014)

Feature interpolation. (A) A second-order boundary stimulus as used by von der Heydt et al. (1984). (B) A stimulus with an illusory contour is perceived in the white gap between the two parts of the white rectangle, as used by von der Heydt et al. (1984). The arrow indicates that the white rectangle was moving. (C) A stimulus where a shape is defined entirely by second-order cues (that is, a difference in orientation), used in many figure-ground segmentation studies (e.g., Lamme, 1995).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4264474&req=5

Figure 1: Feature interpolation. (A) A second-order boundary stimulus as used by von der Heydt et al. (1984). (B) A stimulus with an illusory contour is perceived in the white gap between the two parts of the white rectangle, as used by von der Heydt et al. (1984). The arrow indicates that the white rectangle was moving. (C) A stimulus where a shape is defined entirely by second-order cues (that is, a difference in orientation), used in many figure-ground segmentation studies (e.g., Lamme, 1995).
Mentions: A number of studies have shown that mid-level vision is heavily involved in feature inference. Consider, for example, the seminal series of studies by von der Heydt et al. (1984), von der Heydt and Peterhans (1989), who compared neural responses to the typical luminance-defined stimuli and the neural responses to the same stimuli defined by cues other than luminance. In one of their conditions, a stimulus was composed of two regions containing line segments but with one region shifted with respect to the other, forming an offset-defined discontinuity in the texture, which we refer to as a second-order edge (Figure 1A). Importantly, a simple edge-detecting V1 model would not be able to find such edges, so if some neurons in the visual cortex were responding to such stimuli, it would mean that a higher-order computation is at work that somehow is capable of integrating information across the two regions in the image.

Bottom Line: Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features.Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands.Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Biological Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium ; Laboratory of Experimental Psychology, Faculty of Psychology and Educational Sciences, KU Leuven Leuven, Belgium.

ABSTRACT
If a picture is worth a thousand words, as an English idiom goes, what should those words-or, rather, descriptors-capture? What format of image representation would be sufficiently rich if we were to reconstruct the essence of images from their descriptors? In this paper, we set out to develop a conceptual framework that would be: (i) biologically plausible in order to provide a better mechanistic understanding of our visual system; (ii) sufficiently robust to apply in practice on realistic images; and (iii) able to tap into underlying structure of our visual world. We bring forward three key ideas. First, we argue that surface-based representations are constructed based on feature inference from the input in the intermediate processing layers of the visual system. Such representations are computed in a largely pre-semantic (prior to categorization) and pre-attentive manner using multiple cues (orientation, color, polarity, variation in orientation, and so on), and explicitly retain configural relations between features. The constructed surfaces may be partially overlapping to compensate for occlusions and are ordered in depth (figure-ground organization). Second, we propose that such intermediate representations could be formed by a hierarchical computation of similarity between features in local image patches and pooling of highly-similar units, and reestimated via recurrent loops according to the task demands. Finally, we suggest to use datasets composed of realistically rendered artificial objects and surfaces in order to better understand a model's behavior and its limitations.

No MeSH data available.


Related in: MedlinePlus