Limits...
Extraction of surface-related features in a recurrent model of V1-V2 interactions.

Weidenbacher U, Neumann H - PLoS ONE (2009)

Bottom Line: The approach is based on feedforward and feedback mechanisms found in visual cortical areas V1 and V2.Unlike previous proposals which treat localized junction configurations as 2D image features, we link them to mechanisms of apparent surface segregation.As a consequence, we demonstrate how junctions can change their perceptual representation depending on the scene context and the spatial configuration of boundary fragments.

View Article: PubMed Central - PubMed

Affiliation: Institute of Neural Information Processing, University of Ulm, Ulm, Germany. ulrich.weidenbacher@uni-ulm.de

ABSTRACT

Background: Humans can effortlessly segment surfaces and objects from two-dimensional (2D) images that are projections of the 3D world. The projection from 3D to 2D leads partially to occlusions of surfaces depending on their position in depth and on viewpoint. One way for the human visual system to infer monocular depth cues could be to extract and interpret occlusions. It has been suggested that the perception of contour junctions, in particular T-junctions, may be used as cue for occlusion of opaque surfaces. Furthermore, X-junctions could be used to signal occlusion of transparent surfaces.

Methodology/principal findings: In this contribution, we propose a neural model that suggests how surface-related cues for occlusion can be extracted from a 2D luminance image. The approach is based on feedforward and feedback mechanisms found in visual cortical areas V1 and V2. In a first step, contours are completed over time by generating groupings of like-oriented contrasts. Few iterations of feedforward and feedback processing lead to a stable representation of completed contours and at the same time to a suppression of image noise. In a second step, contour junctions are localized and read out from the distributed representation of boundary groupings. Moreover, surface-related junctions are made explicit such that they are evaluated to interact as to generate surface-segmentations in static images. In addition, we compare our extracted junction signals with a standard computer vision approach for junction detection to demonstrate that our approach outperforms simple feedforward computation-based approaches.

Conclusions/significance: A model is proposed that uses feedforward and feedback mechanisms to combine contextually relevant features in order to generate consistent boundary groupings of surfaces. Perceptually important junction configurations are robustly extracted from neural representations to signal cues for occlusion and transparency. Unlike previous proposals which treat localized junction configurations as 2D image features, we link them to mechanisms of apparent surface segregation. As a consequence, we demonstrate how junctions can change their perceptual representation depending on the scene context and the spatial configuration of boundary fragments.

Show MeSH

Related in: MedlinePlus

Response properties of different model cell populations for different structural configurations together with their most likely interpretation (cue type).Numbers denote the modality of the response distribution across cell pools located at the position marked with a red dot for each structure. A bar means that the cell population is not responsive for this structure. Note, that each structure has a specific neural response profile across different model cell populations which can be used to extract separate saliency maps. For a better understanding, we sketched the configuration of filters together with the underlying structure. Remember, that V2 bipole sub-fields are connected multiplicatively (signalled by a “•”), leading to zero activity of the whole bipole cell if input from one sub-field is missing (symbolized by red crosses). On the other hand, V1 bipole sub-fields are additively connected (signalled by a “○”) which has the effect that input from one sub-field is sufficient to create activity.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2691604&req=5

pone-0005909-g004: Response properties of different model cell populations for different structural configurations together with their most likely interpretation (cue type).Numbers denote the modality of the response distribution across cell pools located at the position marked with a red dot for each structure. A bar means that the cell population is not responsive for this structure. Note, that each structure has a specific neural response profile across different model cell populations which can be used to extract separate saliency maps. For a better understanding, we sketched the configuration of filters together with the underlying structure. Remember, that V2 bipole sub-fields are connected multiplicatively (signalled by a “•”), leading to zero activity of the whole bipole cell if input from one sub-field is missing (symbolized by red crosses). On the other hand, V1 bipole sub-fields are additively connected (signalled by a “○”) which has the effect that input from one sub-field is sufficient to create activity.

Mentions: From the distributed representation of cell responses in both model areas V1 and V2 several retinotopic maps can be extracted that signal perceptually relevant contour configurations. If not mentioned otherwise, these maps are extracted by computing at each position the mean activity of all orientation responses. An alternative method for reading out salience values was suggested by Li [36], who choose to extract at each position the maximum activity over all orientations. In the following, we describe in detail how saliency maps for specific image structures, namely corners and junctions can be extracted by combining activities from different model cells pools. In this paper, we define saliency maps as 2d maps that encode at each position the likelihood that a specific structure is present. A more broad discussion on the concept of salience and salience maps can be found in [37]. In Figure 4 the structural configurations are sketched to present an overview of the output as signaled by the different orientation sensitive mechanisms of the proposed model. This summary indicates how the different visual structures of surface shape outlines and their ordinal depth structure might be selectively encoded neurally through the concert of responses generated by different (model) cell types. The conclusions are two-fold. First, it is indicated that the presence of, e.g., a T-junction (which most often coheres with an opaque surface occlusion [1]) is uniquely indicated by the response pattern of V1 and V2 cells at one spatial location. The T-junction is represented by an end-stop cell response at the end of the T-stem, V1 bipole cell responses in the orientations of both the T-stem (signaled by one active sub-field) and the roof, and finally a V2 bipole cell response in the orientation of the roof of the T (representing the occluding boundary). Second, we argue in favor that no explicit detectors are needed to represent those local 2D structures. Figure 4 indicates that the explicit representation of different junction types necessitates a rich catalogue of cells with rather specific wiring patterns. Below we propose specific read-out mechanisms in order to visualize the information we suggest is important for surface-related analysis of the input structure.


Extraction of surface-related features in a recurrent model of V1-V2 interactions.

Weidenbacher U, Neumann H - PLoS ONE (2009)

Response properties of different model cell populations for different structural configurations together with their most likely interpretation (cue type).Numbers denote the modality of the response distribution across cell pools located at the position marked with a red dot for each structure. A bar means that the cell population is not responsive for this structure. Note, that each structure has a specific neural response profile across different model cell populations which can be used to extract separate saliency maps. For a better understanding, we sketched the configuration of filters together with the underlying structure. Remember, that V2 bipole sub-fields are connected multiplicatively (signalled by a “•”), leading to zero activity of the whole bipole cell if input from one sub-field is missing (symbolized by red crosses). On the other hand, V1 bipole sub-fields are additively connected (signalled by a “○”) which has the effect that input from one sub-field is sufficient to create activity.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2691604&req=5

pone-0005909-g004: Response properties of different model cell populations for different structural configurations together with their most likely interpretation (cue type).Numbers denote the modality of the response distribution across cell pools located at the position marked with a red dot for each structure. A bar means that the cell population is not responsive for this structure. Note, that each structure has a specific neural response profile across different model cell populations which can be used to extract separate saliency maps. For a better understanding, we sketched the configuration of filters together with the underlying structure. Remember, that V2 bipole sub-fields are connected multiplicatively (signalled by a “•”), leading to zero activity of the whole bipole cell if input from one sub-field is missing (symbolized by red crosses). On the other hand, V1 bipole sub-fields are additively connected (signalled by a “○”) which has the effect that input from one sub-field is sufficient to create activity.
Mentions: From the distributed representation of cell responses in both model areas V1 and V2 several retinotopic maps can be extracted that signal perceptually relevant contour configurations. If not mentioned otherwise, these maps are extracted by computing at each position the mean activity of all orientation responses. An alternative method for reading out salience values was suggested by Li [36], who choose to extract at each position the maximum activity over all orientations. In the following, we describe in detail how saliency maps for specific image structures, namely corners and junctions can be extracted by combining activities from different model cells pools. In this paper, we define saliency maps as 2d maps that encode at each position the likelihood that a specific structure is present. A more broad discussion on the concept of salience and salience maps can be found in [37]. In Figure 4 the structural configurations are sketched to present an overview of the output as signaled by the different orientation sensitive mechanisms of the proposed model. This summary indicates how the different visual structures of surface shape outlines and their ordinal depth structure might be selectively encoded neurally through the concert of responses generated by different (model) cell types. The conclusions are two-fold. First, it is indicated that the presence of, e.g., a T-junction (which most often coheres with an opaque surface occlusion [1]) is uniquely indicated by the response pattern of V1 and V2 cells at one spatial location. The T-junction is represented by an end-stop cell response at the end of the T-stem, V1 bipole cell responses in the orientations of both the T-stem (signaled by one active sub-field) and the roof, and finally a V2 bipole cell response in the orientation of the roof of the T (representing the occluding boundary). Second, we argue in favor that no explicit detectors are needed to represent those local 2D structures. Figure 4 indicates that the explicit representation of different junction types necessitates a rich catalogue of cells with rather specific wiring patterns. Below we propose specific read-out mechanisms in order to visualize the information we suggest is important for surface-related analysis of the input structure.

Bottom Line: The approach is based on feedforward and feedback mechanisms found in visual cortical areas V1 and V2.Unlike previous proposals which treat localized junction configurations as 2D image features, we link them to mechanisms of apparent surface segregation.As a consequence, we demonstrate how junctions can change their perceptual representation depending on the scene context and the spatial configuration of boundary fragments.

View Article: PubMed Central - PubMed

Affiliation: Institute of Neural Information Processing, University of Ulm, Ulm, Germany. ulrich.weidenbacher@uni-ulm.de

ABSTRACT

Background: Humans can effortlessly segment surfaces and objects from two-dimensional (2D) images that are projections of the 3D world. The projection from 3D to 2D leads partially to occlusions of surfaces depending on their position in depth and on viewpoint. One way for the human visual system to infer monocular depth cues could be to extract and interpret occlusions. It has been suggested that the perception of contour junctions, in particular T-junctions, may be used as cue for occlusion of opaque surfaces. Furthermore, X-junctions could be used to signal occlusion of transparent surfaces.

Methodology/principal findings: In this contribution, we propose a neural model that suggests how surface-related cues for occlusion can be extracted from a 2D luminance image. The approach is based on feedforward and feedback mechanisms found in visual cortical areas V1 and V2. In a first step, contours are completed over time by generating groupings of like-oriented contrasts. Few iterations of feedforward and feedback processing lead to a stable representation of completed contours and at the same time to a suppression of image noise. In a second step, contour junctions are localized and read out from the distributed representation of boundary groupings. Moreover, surface-related junctions are made explicit such that they are evaluated to interact as to generate surface-segmentations in static images. In addition, we compare our extracted junction signals with a standard computer vision approach for junction detection to demonstrate that our approach outperforms simple feedforward computation-based approaches.

Conclusions/significance: A model is proposed that uses feedforward and feedback mechanisms to combine contextually relevant features in order to generate consistent boundary groupings of surfaces. Perceptually important junction configurations are robustly extracted from neural representations to signal cues for occlusion and transparency. Unlike previous proposals which treat localized junction configurations as 2D image features, we link them to mechanisms of apparent surface segregation. As a consequence, we demonstrate how junctions can change their perceptual representation depending on the scene context and the spatial configuration of boundary fragments.

Show MeSH
Related in: MedlinePlus