Limits...
Deep supervised, but not unsupervised, models may explain IT cortical representation.

Khaligh-Razavi SM, Kriegeskorte N - PLoS Comput. Biol. (2014)

Bottom Line: Computational object-vision models, although continually improving, do not yet reach human performance.The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data.Overall, our results suggest that explaining IT requires computational features trained through supervised learning to emphasize the behaviorally important categorical divisions prominently reflected in IT.

View Article: PubMed Central - PubMed

Affiliation: Medical Research Council, Cognition and Brain Sciences Unit, Cambridge, United Kingdom.

ABSTRACT
Inferior temporal (IT) cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total), testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet) along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network). We compared the representational dissimilarity matrices (RDMs) of the model representations with the RDMs obtained from human IT (measured with fMRI) and monkey IT (measured with cell recording) for the same set of stimuli (not used in training the models). Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining IT requires computational features trained through supervised learning to emphasize the behaviorally important categorical divisions prominently reflected in IT.

Show MeSH

Related in: MedlinePlus

IT-like categorical structure is not apparent in any of the not-strongly-supervised models.Brain and model RDMs are shown in the left columns of each panel. We used a linear combination of category-cluster RDMs (Figure S5) to model the categorical structure (least-squares fit). The categories modeled were animate, inanimate, face, human face, non-human face, body, human body, non-human body, natural inanimates, and artificial inanimates. The fitted linear-combination of category-cluster RDMs is shown in the middle columns. This descriptive visualization shows to what extent different categorical divisions are prominent in each RDM. The residual RDMs of the fits are shown in the right column. For statistical inference, see Figure 4.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4222664&req=5

pcbi-1003915-g003: IT-like categorical structure is not apparent in any of the not-strongly-supervised models.Brain and model RDMs are shown in the left columns of each panel. We used a linear combination of category-cluster RDMs (Figure S5) to model the categorical structure (least-squares fit). The categories modeled were animate, inanimate, face, human face, non-human face, body, human body, non-human body, natural inanimates, and artificial inanimates. The fitted linear-combination of category-cluster RDMs is shown in the middle columns. This descriptive visualization shows to what extent different categorical divisions are prominent in each RDM. The residual RDMs of the fits are shown in the right column. For statistical inference, see Figure 4.

Mentions: Descriptive category-clustering analysis as in Figure 3, but for the deep supervised network. We used a linear combination of category-cluster RDMs (Figure S5) to model the categorical structure. The fitted linear-combination of category-cluster RDMs is shown in the middle columns. This descriptive visualization shows to what extent different categorical divisions are prominent in each layer of the deep supervised model. The layers show some of the categorical divisions emerging. However, remixing of the features (linear SVM readout) is required to emphasize the categorical divisions to a degree that is similar to IT. The final IT-geometry-supervised layer (weighted combination of layers and SVM discriminants) has a categorical structure that is very similar to IT. Overfitting to the image set was avoided by crossvalidation. For statistical inference, see Figure 9.


Deep supervised, but not unsupervised, models may explain IT cortical representation.

Khaligh-Razavi SM, Kriegeskorte N - PLoS Comput. Biol. (2014)

IT-like categorical structure is not apparent in any of the not-strongly-supervised models.Brain and model RDMs are shown in the left columns of each panel. We used a linear combination of category-cluster RDMs (Figure S5) to model the categorical structure (least-squares fit). The categories modeled were animate, inanimate, face, human face, non-human face, body, human body, non-human body, natural inanimates, and artificial inanimates. The fitted linear-combination of category-cluster RDMs is shown in the middle columns. This descriptive visualization shows to what extent different categorical divisions are prominent in each RDM. The residual RDMs of the fits are shown in the right column. For statistical inference, see Figure 4.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4222664&req=5

pcbi-1003915-g003: IT-like categorical structure is not apparent in any of the not-strongly-supervised models.Brain and model RDMs are shown in the left columns of each panel. We used a linear combination of category-cluster RDMs (Figure S5) to model the categorical structure (least-squares fit). The categories modeled were animate, inanimate, face, human face, non-human face, body, human body, non-human body, natural inanimates, and artificial inanimates. The fitted linear-combination of category-cluster RDMs is shown in the middle columns. This descriptive visualization shows to what extent different categorical divisions are prominent in each RDM. The residual RDMs of the fits are shown in the right column. For statistical inference, see Figure 4.
Mentions: Descriptive category-clustering analysis as in Figure 3, but for the deep supervised network. We used a linear combination of category-cluster RDMs (Figure S5) to model the categorical structure. The fitted linear-combination of category-cluster RDMs is shown in the middle columns. This descriptive visualization shows to what extent different categorical divisions are prominent in each layer of the deep supervised model. The layers show some of the categorical divisions emerging. However, remixing of the features (linear SVM readout) is required to emphasize the categorical divisions to a degree that is similar to IT. The final IT-geometry-supervised layer (weighted combination of layers and SVM discriminants) has a categorical structure that is very similar to IT. Overfitting to the image set was avoided by crossvalidation. For statistical inference, see Figure 9.

Bottom Line: Computational object-vision models, although continually improving, do not yet reach human performance.The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data.Overall, our results suggest that explaining IT requires computational features trained through supervised learning to emphasize the behaviorally important categorical divisions prominently reflected in IT.

View Article: PubMed Central - PubMed

Affiliation: Medical Research Council, Cognition and Brain Sciences Unit, Cambridge, United Kingdom.

ABSTRACT
Inferior temporal (IT) cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total), testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet) along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network). We compared the representational dissimilarity matrices (RDMs) of the model representations with the RDMs obtained from human IT (measured with fMRI) and monkey IT (measured with cell recording) for the same set of stimuli (not used in training the models). Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining IT requires computational features trained through supervised learning to emphasize the behaviorally important categorical divisions prominently reflected in IT.

Show MeSH
Related in: MedlinePlus