Limits...
Self-Organization of Spatio-Temporal Hierarchy via Learning of Dynamic Visual Image Patterns on Action Sequences.

Jung M, Hwang J, Tani J - PLoS ONE (2015)

Bottom Line: Up to now, however, most biological and computational modeling studies have mainly focused on the spatial domain and do not discuss temporal domain processing of the visual cortex.This model is characterized by the application of both spatial and temporal constraints on local neural activities, resulting in the self-organization of a spatio-temporal hierarchy necessary for the recognition of complex dynamic visual image patterns.Furthermore, an evaluation test for the recognition of concatenated sequences of those prototypical movement patterns indicated that the model is endowed with a remarkable capability for the contextual recognition of long-range dynamic visual image patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical Engineering, KAIST, Daejeon, Republic of Korea.

ABSTRACT
It is well known that the visual cortex efficiently processes high-dimensional spatial information by using a hierarchical structure. Recently, computational models that were inspired by the spatial hierarchy of the visual cortex have shown remarkable performance in image recognition. Up to now, however, most biological and computational modeling studies have mainly focused on the spatial domain and do not discuss temporal domain processing of the visual cortex. Several studies on the visual cortex and other brain areas associated with motor control support that the brain also uses its hierarchical structure as a processing mechanism for temporal information. Based on the success of previous computational models using spatial hierarchy and temporal hierarchy observed in the brain, the current report introduces a novel neural network model for the recognition of dynamic visual image patterns based solely on the learning of exemplars. This model is characterized by the application of both spatial and temporal constraints on local neural activities, resulting in the self-organization of a spatio-temporal hierarchy necessary for the recognition of complex dynamic visual image patterns. The evaluation with the Weizmann dataset in recognition of a set of prototypical human movement patterns showed that the proposed model is significantly robust in recognizing dynamically occluded visual patterns compared to other baseline models. Furthermore, an evaluation test for the recognition of concatenated sequences of those prototypical movement patterns indicated that the model is endowed with a remarkable capability for the contextual recognition of long-range dynamic visual image patterns.

No MeSH data available.


Related in: MedlinePlus

Architecture of the MSTNN.The architecture consists of one input layer, three convolutional layers, two max-pooling layers, and one fully-connected output layer. Convolutional layers apply convolution operations with kernels to previous layers (black solid lines). Max-pooling layers select maximum values within local windows from previous convolutional layers (black dotted lines). Each layer has a set of parameters: dimensions of layer (feature map column size×feature map row size×number of feature maps), kernel size, and max-pooling size. Only the convolutional layers have an additional time constant parameter τ (red solid arrow), which plays a key role in this model. The higher convolutional layer has a larger time constant than the lower convolutional layer. Layer 6 is the softmax activation function used for classification (N is the number of classes).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4492609&req=5

pone.0131214.g001: Architecture of the MSTNN.The architecture consists of one input layer, three convolutional layers, two max-pooling layers, and one fully-connected output layer. Convolutional layers apply convolution operations with kernels to previous layers (black solid lines). Max-pooling layers select maximum values within local windows from previous convolutional layers (black dotted lines). Each layer has a set of parameters: dimensions of layer (feature map column size×feature map row size×number of feature maps), kernel size, and max-pooling size. Only the convolutional layers have an additional time constant parameter τ (red solid arrow), which plays a key role in this model. The higher convolutional layer has a larger time constant than the lower convolutional layer. Layer 6 is the softmax activation function used for classification (N is the number of classes).

Mentions: The network consists of 7 layers: one input layer (layer 0), three convolutional layers (layers 1, 3, and 5), two max-pooling layers (layers 2 and 4), and one fully-connected layer (layer 6) (see Fig 1). Layer 0 is the input layer, which has only one feature map size of 48×54, and contains the raw input image. Layer 1 is a convolutional layer that has 6 feature maps of size 40×40. Layer 1 convolutes layer 0 with a kernel size of 9×15. These feature maps are encoded via dynamic activities of (40×40×6) leaky integrator neural units with their time constant τ set to 2.0. Layer 2 is a max-pooling layer that has 6 feature maps of size 20×20. The number of feature maps of the max-pooling layer is the same as that of its previous convolutional layer because each feature map of the max-pooling layer and convolutional layer are coupled. Each feature map in layer 2 takes a maximum value within local patches with a size of 2×2 from the corresponding feature map in layer 1. Layer 3 is a convolutional layer encompassing 50 feature maps of size 14×14 and a time constant τ set to 5.0. Layer 3 convolutes layer 2 with a kernel size of 7×7. Layer 4 is a max-pooling layer that has 50 feature maps of size 7×7. Each feature map in layer 4 takes a maximum value within local patches with a size of 2×2 from the corresponding feature map in layer 3. Layer 5 is a convolutional layer that has 100 feature maps of size 1×1 and a time constant τ set to 100.0. Layer 5 convolutes layer 4 with a kernel size of 7×7. Layer 6 is a fully-connected layer which generates the categorical outputs encoded by a set of static neural units using the softmax activation function. The number of neurons in layer 6 is the same as the number of classes in a dataset. Each neuron in layer 6 is fully-connected with all the neurons of the 100 feature maps in layer 5.


Self-Organization of Spatio-Temporal Hierarchy via Learning of Dynamic Visual Image Patterns on Action Sequences.

Jung M, Hwang J, Tani J - PLoS ONE (2015)

Architecture of the MSTNN.The architecture consists of one input layer, three convolutional layers, two max-pooling layers, and one fully-connected output layer. Convolutional layers apply convolution operations with kernels to previous layers (black solid lines). Max-pooling layers select maximum values within local windows from previous convolutional layers (black dotted lines). Each layer has a set of parameters: dimensions of layer (feature map column size×feature map row size×number of feature maps), kernel size, and max-pooling size. Only the convolutional layers have an additional time constant parameter τ (red solid arrow), which plays a key role in this model. The higher convolutional layer has a larger time constant than the lower convolutional layer. Layer 6 is the softmax activation function used for classification (N is the number of classes).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4492609&req=5

pone.0131214.g001: Architecture of the MSTNN.The architecture consists of one input layer, three convolutional layers, two max-pooling layers, and one fully-connected output layer. Convolutional layers apply convolution operations with kernels to previous layers (black solid lines). Max-pooling layers select maximum values within local windows from previous convolutional layers (black dotted lines). Each layer has a set of parameters: dimensions of layer (feature map column size×feature map row size×number of feature maps), kernel size, and max-pooling size. Only the convolutional layers have an additional time constant parameter τ (red solid arrow), which plays a key role in this model. The higher convolutional layer has a larger time constant than the lower convolutional layer. Layer 6 is the softmax activation function used for classification (N is the number of classes).
Mentions: The network consists of 7 layers: one input layer (layer 0), three convolutional layers (layers 1, 3, and 5), two max-pooling layers (layers 2 and 4), and one fully-connected layer (layer 6) (see Fig 1). Layer 0 is the input layer, which has only one feature map size of 48×54, and contains the raw input image. Layer 1 is a convolutional layer that has 6 feature maps of size 40×40. Layer 1 convolutes layer 0 with a kernel size of 9×15. These feature maps are encoded via dynamic activities of (40×40×6) leaky integrator neural units with their time constant τ set to 2.0. Layer 2 is a max-pooling layer that has 6 feature maps of size 20×20. The number of feature maps of the max-pooling layer is the same as that of its previous convolutional layer because each feature map of the max-pooling layer and convolutional layer are coupled. Each feature map in layer 2 takes a maximum value within local patches with a size of 2×2 from the corresponding feature map in layer 1. Layer 3 is a convolutional layer encompassing 50 feature maps of size 14×14 and a time constant τ set to 5.0. Layer 3 convolutes layer 2 with a kernel size of 7×7. Layer 4 is a max-pooling layer that has 50 feature maps of size 7×7. Each feature map in layer 4 takes a maximum value within local patches with a size of 2×2 from the corresponding feature map in layer 3. Layer 5 is a convolutional layer that has 100 feature maps of size 1×1 and a time constant τ set to 100.0. Layer 5 convolutes layer 4 with a kernel size of 7×7. Layer 6 is a fully-connected layer which generates the categorical outputs encoded by a set of static neural units using the softmax activation function. The number of neurons in layer 6 is the same as the number of classes in a dataset. Each neuron in layer 6 is fully-connected with all the neurons of the 100 feature maps in layer 5.

Bottom Line: Up to now, however, most biological and computational modeling studies have mainly focused on the spatial domain and do not discuss temporal domain processing of the visual cortex.This model is characterized by the application of both spatial and temporal constraints on local neural activities, resulting in the self-organization of a spatio-temporal hierarchy necessary for the recognition of complex dynamic visual image patterns.Furthermore, an evaluation test for the recognition of concatenated sequences of those prototypical movement patterns indicated that the model is endowed with a remarkable capability for the contextual recognition of long-range dynamic visual image patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical Engineering, KAIST, Daejeon, Republic of Korea.

ABSTRACT
It is well known that the visual cortex efficiently processes high-dimensional spatial information by using a hierarchical structure. Recently, computational models that were inspired by the spatial hierarchy of the visual cortex have shown remarkable performance in image recognition. Up to now, however, most biological and computational modeling studies have mainly focused on the spatial domain and do not discuss temporal domain processing of the visual cortex. Several studies on the visual cortex and other brain areas associated with motor control support that the brain also uses its hierarchical structure as a processing mechanism for temporal information. Based on the success of previous computational models using spatial hierarchy and temporal hierarchy observed in the brain, the current report introduces a novel neural network model for the recognition of dynamic visual image patterns based solely on the learning of exemplars. This model is characterized by the application of both spatial and temporal constraints on local neural activities, resulting in the self-organization of a spatio-temporal hierarchy necessary for the recognition of complex dynamic visual image patterns. The evaluation with the Weizmann dataset in recognition of a set of prototypical human movement patterns showed that the proposed model is significantly robust in recognizing dynamically occluded visual patterns compared to other baseline models. Furthermore, an evaluation test for the recognition of concatenated sequences of those prototypical movement patterns indicated that the model is endowed with a remarkable capability for the contextual recognition of long-range dynamic visual image patterns.

No MeSH data available.


Related in: MedlinePlus