Limits...
Basic level scene understanding: categories, attributes and structures.

Xiao J, Hays J, Russell BC, Patterson G, Ehinger KA, Torralba A, Oliva A - Front Psychol (2013)

Bottom Line: This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition.We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition.Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

View Article: PubMed Central - PubMed

Affiliation: Computer Science, Princeton University Princeton, NJ, USA.

ABSTRACT
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

No MeSH data available.


(A–R) Illustration of various rules we design to describe both the image evidence and context compatibility. All these rules are encoded in the structural SVM feature function f(x, y).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3756302&req=5

Figure 11: (A–R) Illustration of various rules we design to describe both the image evidence and context compatibility. All these rules are encoded in the structural SVM feature function f(x, y).

Mentions: Unified 3D scene parsing for basic level scene understanding. For each image, we generate a pool of hypotheses. For each hypothesis, we construct a feature vector f encoding both image features and scores from evaluating various contextual rules on the hypothesized scene structure (Figure 11). We choose the hypothesis that maximizes the objective function wTf as the result of 3D scene parsing. As by-products of our 3D parsing result we obtain information that have traditionally been considered in isolation, such as the 2D location of objects, their depth and 3D surface orientation.


Basic level scene understanding: categories, attributes and structures.

Xiao J, Hays J, Russell BC, Patterson G, Ehinger KA, Torralba A, Oliva A - Front Psychol (2013)

(A–R) Illustration of various rules we design to describe both the image evidence and context compatibility. All these rules are encoded in the structural SVM feature function f(x, y).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3756302&req=5

Figure 11: (A–R) Illustration of various rules we design to describe both the image evidence and context compatibility. All these rules are encoded in the structural SVM feature function f(x, y).
Mentions: Unified 3D scene parsing for basic level scene understanding. For each image, we generate a pool of hypotheses. For each hypothesis, we construct a feature vector f encoding both image features and scores from evaluating various contextual rules on the hypothesized scene structure (Figure 11). We choose the hypothesis that maximizes the objective function wTf as the result of 3D scene parsing. As by-products of our 3D parsing result we obtain information that have traditionally been considered in isolation, such as the 2D location of objects, their depth and 3D surface orientation.

Bottom Line: This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition.We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition.Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

View Article: PubMed Central - PubMed

Affiliation: Computer Science, Princeton University Princeton, NJ, USA.

ABSTRACT
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.

No MeSH data available.