Limits...
Basic level scene understanding: categories, attributes and structures.

Xiao J, Hays J, Russell BC, Patterson G, Ehinger KA, Torralba A, Oliva A - Front Psychol (2013)

Bottom Line: This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition.We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition.Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

View Article: PubMed Central - PubMed

Affiliation: Computer Science, Princeton University Princeton, NJ, USA.

ABSTRACT
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

No MeSH data available.


Attributes in the SUN Attribute database. The area of each word is proportional to the frequency of that attribute.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3756302&req=5

Figure 6: Attributes in the SUN Attribute database. The area of each word is proportional to the frequency of that attribute.

Mentions: Our first task is to establish a taxonomy of scene attributes for further study. We use a simple, crowd-sourced “splitting task” (Oliva and Torralba, 2001) in which we show Amazon Mechanical Turk (AMT) workers two groups of scenes and ask them to list attributes that are present in one group but not the other. The images that make up these groups are “typical” (Ehinger et al., 2011) scenes from random categories of the SUN database. From the thousands of attributes reported by participants we manually collapse nearly synonymous responses (e.g., dirt and soil) into single attributes. We omit object presence attributes because the SUN database already has dense object labels for many scenes. In the end, we arrive at a taxonomy of 38 material attributes (e.g., cement, vegetation), 11 surface properties (e.g., rusty), 36 functions or affordances (e.g., playing, cooking), and 17 spatial envelope attributes (e.g., enclosed, symmetric). See Figure 6 for the full list.


Basic level scene understanding: categories, attributes and structures.

Xiao J, Hays J, Russell BC, Patterson G, Ehinger KA, Torralba A, Oliva A - Front Psychol (2013)

Attributes in the SUN Attribute database. The area of each word is proportional to the frequency of that attribute.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3756302&req=5

Figure 6: Attributes in the SUN Attribute database. The area of each word is proportional to the frequency of that attribute.
Mentions: Our first task is to establish a taxonomy of scene attributes for further study. We use a simple, crowd-sourced “splitting task” (Oliva and Torralba, 2001) in which we show Amazon Mechanical Turk (AMT) workers two groups of scenes and ask them to list attributes that are present in one group but not the other. The images that make up these groups are “typical” (Ehinger et al., 2011) scenes from random categories of the SUN database. From the thousands of attributes reported by participants we manually collapse nearly synonymous responses (e.g., dirt and soil) into single attributes. We omit object presence attributes because the SUN database already has dense object labels for many scenes. In the end, we arrive at a taxonomy of 38 material attributes (e.g., cement, vegetation), 11 surface properties (e.g., rusty), 36 functions or affordances (e.g., playing, cooking), and 17 spatial envelope attributes (e.g., enclosed, symmetric). See Figure 6 for the full list.

Bottom Line: This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition.We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition.Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

View Article: PubMed Central - PubMed

Affiliation: Computer Science, Princeton University Princeton, NJ, USA.

ABSTRACT
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

No MeSH data available.