Limits...
Where's Waldo? How perceptual, cognitive, and emotional brain processes cooperate during learning to categorize and find desired objects in a cluttered scene.

Chang HC, Grossberg S, Cao Y - Front Integr Neurosci (2014)

Bottom Line: What stream cognitive-emotional learning processes enable the focusing of motivated attention upon the invariant object categories of desired objects.A volitional signal can convert these primes into top-down activations that can, in turn, prime What stream view- and positionally-specific categories.These processes describe interactions among brain regions that include visual cortex, parietal cortex, inferotemporal cortex, prefrontal cortex (PFC), amygdala, basal ganglia (BG), and superior colliculus (SC).

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Cognitive and Neural Systems, Department of Mathematics, Center for Adaptive Systems, Center for Computational Neuroscience and Neural Technology, Boston University Boston, MA, USA.

ABSTRACT
The Where's Waldo problem concerns how individuals can rapidly learn to search a scene to detect, attend, recognize, and look at a valued target object in it. This article develops the ARTSCAN Search neural model to clarify how brain mechanisms across the What and Where cortical streams are coordinated to solve the Where's Waldo problem. The What stream learns positionally-invariant object representations, whereas the Where stream controls positionally-selective spatial and action representations. The model overcomes deficiencies of these computationally complementary properties through What and Where stream interactions. Where stream processes of spatial attention and predictive eye movement control modulate What stream processes whereby multiple view- and positionally-specific object categories are learned and associatively linked to view- and positionally-invariant object categories through bottom-up and attentive top-down interactions. Gain fields control the coordinate transformations that enable spatial attention and predictive eye movements to carry out this role. What stream cognitive-emotional learning processes enable the focusing of motivated attention upon the invariant object categories of desired objects. What stream cognitive names or motivational drives can prime a view- and positionally-invariant object category of a desired target object. A volitional signal can convert these primes into top-down activations that can, in turn, prime What stream view- and positionally-specific categories. When it also receives bottom-up activation from a target, such a positionally-specific category can cause an attentional shift in the Where stream to the positional representation of the target, and an eye movement can then be elicited to foveate it. These processes describe interactions among brain regions that include visual cortex, parietal cortex, inferotemporal cortex, prefrontal cortex (PFC), amygdala, basal ganglia (BG), and superior colliculus (SC).

No MeSH data available.


Related in: MedlinePlus

Set of object stimuli for view- and positionally-invariant category learning. (A) Each object reflects the relative size within 100 × 100 pixels from Caltech 101 dataset. (B) A simulated scene for simulations of view-invariant object category learning in section 8.1. A scenic input image is partitioned into 25 regions (solid lines) and objects are located in the central regions of the input scene (regions 7, 8, 9, 12, 13, 14, 17, 18, and 19). Region 5 is the foveal region and others are the peripheral regions. (C) The bottom-up input representations after cellphone becomes the attended object and is foveated. (D) The bottom-up input representation when motorcycle becomes foveated after the soccer ball and cellphone are learned. (E,F) A sequence of simulated scenes for simulations of positionally- and view-invariant object category learning in section 8.2. Each scenic input only contains one object located in one of the center regions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4060746&req=5

Figure 8: Set of object stimuli for view- and positionally-invariant category learning. (A) Each object reflects the relative size within 100 × 100 pixels from Caltech 101 dataset. (B) A simulated scene for simulations of view-invariant object category learning in section 8.1. A scenic input image is partitioned into 25 regions (solid lines) and objects are located in the central regions of the input scene (regions 7, 8, 9, 12, 13, 14, 17, 18, and 19). Region 5 is the foveal region and others are the peripheral regions. (C) The bottom-up input representations after cellphone becomes the attended object and is foveated. (D) The bottom-up input representation when motorcycle becomes foveated after the soccer ball and cellphone are learned. (E,F) A sequence of simulated scenes for simulations of positionally- and view-invariant object category learning in section 8.2. Each scenic input only contains one object located in one of the center regions.

Mentions: The simulations of the ARTSCAN Search model demonstrate multiple sites of coordinated category, reinforcement, and cognitive learning, and use of the learned connections to carry out both bottom-up and top-down Where’s Waldo searches. ARTSCAN Search simulations process 24 objects taken from natural images of the Caltech 101 data base, with each object selected from different categories as Where’s Waldo exemplars. Each object is customized into 100 × 100 pixels (Figure 8A) against a homogeneous gray background with a luminance value of 0.5. The objects are in a gray scale with luminance values between 0 and 1. Input scenes are presented and simulated in Cartesian coordinates, for simplicity. A simulated scene is represented by 500 × 500 pixels and is divided into 25 regions of 100 × 100 pixels, with each region denoted as one position capable of representing one object.


Where's Waldo? How perceptual, cognitive, and emotional brain processes cooperate during learning to categorize and find desired objects in a cluttered scene.

Chang HC, Grossberg S, Cao Y - Front Integr Neurosci (2014)

Set of object stimuli for view- and positionally-invariant category learning. (A) Each object reflects the relative size within 100 × 100 pixels from Caltech 101 dataset. (B) A simulated scene for simulations of view-invariant object category learning in section 8.1. A scenic input image is partitioned into 25 regions (solid lines) and objects are located in the central regions of the input scene (regions 7, 8, 9, 12, 13, 14, 17, 18, and 19). Region 5 is the foveal region and others are the peripheral regions. (C) The bottom-up input representations after cellphone becomes the attended object and is foveated. (D) The bottom-up input representation when motorcycle becomes foveated after the soccer ball and cellphone are learned. (E,F) A sequence of simulated scenes for simulations of positionally- and view-invariant object category learning in section 8.2. Each scenic input only contains one object located in one of the center regions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4060746&req=5

Figure 8: Set of object stimuli for view- and positionally-invariant category learning. (A) Each object reflects the relative size within 100 × 100 pixels from Caltech 101 dataset. (B) A simulated scene for simulations of view-invariant object category learning in section 8.1. A scenic input image is partitioned into 25 regions (solid lines) and objects are located in the central regions of the input scene (regions 7, 8, 9, 12, 13, 14, 17, 18, and 19). Region 5 is the foveal region and others are the peripheral regions. (C) The bottom-up input representations after cellphone becomes the attended object and is foveated. (D) The bottom-up input representation when motorcycle becomes foveated after the soccer ball and cellphone are learned. (E,F) A sequence of simulated scenes for simulations of positionally- and view-invariant object category learning in section 8.2. Each scenic input only contains one object located in one of the center regions.
Mentions: The simulations of the ARTSCAN Search model demonstrate multiple sites of coordinated category, reinforcement, and cognitive learning, and use of the learned connections to carry out both bottom-up and top-down Where’s Waldo searches. ARTSCAN Search simulations process 24 objects taken from natural images of the Caltech 101 data base, with each object selected from different categories as Where’s Waldo exemplars. Each object is customized into 100 × 100 pixels (Figure 8A) against a homogeneous gray background with a luminance value of 0.5. The objects are in a gray scale with luminance values between 0 and 1. Input scenes are presented and simulated in Cartesian coordinates, for simplicity. A simulated scene is represented by 500 × 500 pixels and is divided into 25 regions of 100 × 100 pixels, with each region denoted as one position capable of representing one object.

Bottom Line: What stream cognitive-emotional learning processes enable the focusing of motivated attention upon the invariant object categories of desired objects.A volitional signal can convert these primes into top-down activations that can, in turn, prime What stream view- and positionally-specific categories.These processes describe interactions among brain regions that include visual cortex, parietal cortex, inferotemporal cortex, prefrontal cortex (PFC), amygdala, basal ganglia (BG), and superior colliculus (SC).

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Cognitive and Neural Systems, Department of Mathematics, Center for Adaptive Systems, Center for Computational Neuroscience and Neural Technology, Boston University Boston, MA, USA.

ABSTRACT
The Where's Waldo problem concerns how individuals can rapidly learn to search a scene to detect, attend, recognize, and look at a valued target object in it. This article develops the ARTSCAN Search neural model to clarify how brain mechanisms across the What and Where cortical streams are coordinated to solve the Where's Waldo problem. The What stream learns positionally-invariant object representations, whereas the Where stream controls positionally-selective spatial and action representations. The model overcomes deficiencies of these computationally complementary properties through What and Where stream interactions. Where stream processes of spatial attention and predictive eye movement control modulate What stream processes whereby multiple view- and positionally-specific object categories are learned and associatively linked to view- and positionally-invariant object categories through bottom-up and attentive top-down interactions. Gain fields control the coordinate transformations that enable spatial attention and predictive eye movements to carry out this role. What stream cognitive-emotional learning processes enable the focusing of motivated attention upon the invariant object categories of desired objects. What stream cognitive names or motivational drives can prime a view- and positionally-invariant object category of a desired target object. A volitional signal can convert these primes into top-down activations that can, in turn, prime What stream view- and positionally-specific categories. When it also receives bottom-up activation from a target, such a positionally-specific category can cause an attentional shift in the Where stream to the positional representation of the target, and an eye movement can then be elicited to foveate it. These processes describe interactions among brain regions that include visual cortex, parietal cortex, inferotemporal cortex, prefrontal cortex (PFC), amygdala, basal ganglia (BG), and superior colliculus (SC).

No MeSH data available.


Related in: MedlinePlus