Limits...
Visual Saliency Models for Text Detection in Real World.

Gao R, Uchida S, Shahab A, Shafait F, Frinken V - PLoS ONE (2014)

Bottom Line: In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in.In the second stage, Itti's model is applied to the salient region to calculate the final saliency map.An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.

View Article: PubMed Central - PubMed

Affiliation: Department of Advanced Information technology, Kyushu University, Fukuoka, Fukuoka, Japan.

ABSTRACT
This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.

No MeSH data available.


Examples of five state-of-the-art saliency maps.(a) Input images. (b) Ground truth images. Rectangles mean “don't care” texts. (c) Visual saliency maps from Itti's visual saliency model. (d) Visual saliency maps from Harel's graph-based visual saliency model. (e) Visual saliency maps from Torralba's visual saliency model. (f) Visual saliency maps from Fast saliency model. (g) Visual saliency maps from frequency-tuned model. For more examples, please refer to Fig. 15. (Copyrights of those figures are listed in Acknowledgments.)
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4262416&req=5

pone-0114539-g005: Examples of five state-of-the-art saliency maps.(a) Input images. (b) Ground truth images. Rectangles mean “don't care” texts. (c) Visual saliency maps from Itti's visual saliency model. (d) Visual saliency maps from Harel's graph-based visual saliency model. (e) Visual saliency maps from Torralba's visual saliency model. (f) Visual saliency maps from Fast saliency model. (g) Visual saliency maps from frequency-tuned model. For more examples, please refer to Fig. 15. (Copyrights of those figures are listed in Acknowledgments.)

Mentions: Three experiments are given in this section. The first experiment (Fig. 5) was done to calculate the saliency maps using the five state-of-the-art visual saliency models with the aim of evaluating how scene texts are salient qualitatively. The second experiment (Fig. 6(a)) is the ROC-based performance evaluation of Itti's visual saliency model with different features. This experiment was done in order to investigate how salient scene texts are for each low level feature. Finally, the last experiment (Fig. 6(b)) is the ROC-based performance evaluation, using all the five state-of-the-art visual saliency models.


Visual Saliency Models for Text Detection in Real World.

Gao R, Uchida S, Shahab A, Shafait F, Frinken V - PLoS ONE (2014)

Examples of five state-of-the-art saliency maps.(a) Input images. (b) Ground truth images. Rectangles mean “don't care” texts. (c) Visual saliency maps from Itti's visual saliency model. (d) Visual saliency maps from Harel's graph-based visual saliency model. (e) Visual saliency maps from Torralba's visual saliency model. (f) Visual saliency maps from Fast saliency model. (g) Visual saliency maps from frequency-tuned model. For more examples, please refer to Fig. 15. (Copyrights of those figures are listed in Acknowledgments.)
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4262416&req=5

pone-0114539-g005: Examples of five state-of-the-art saliency maps.(a) Input images. (b) Ground truth images. Rectangles mean “don't care” texts. (c) Visual saliency maps from Itti's visual saliency model. (d) Visual saliency maps from Harel's graph-based visual saliency model. (e) Visual saliency maps from Torralba's visual saliency model. (f) Visual saliency maps from Fast saliency model. (g) Visual saliency maps from frequency-tuned model. For more examples, please refer to Fig. 15. (Copyrights of those figures are listed in Acknowledgments.)
Mentions: Three experiments are given in this section. The first experiment (Fig. 5) was done to calculate the saliency maps using the five state-of-the-art visual saliency models with the aim of evaluating how scene texts are salient qualitatively. The second experiment (Fig. 6(a)) is the ROC-based performance evaluation of Itti's visual saliency model with different features. This experiment was done in order to investigate how salient scene texts are for each low level feature. Finally, the last experiment (Fig. 6(b)) is the ROC-based performance evaluation, using all the five state-of-the-art visual saliency models.

Bottom Line: In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in.In the second stage, Itti's model is applied to the salient region to calculate the final saliency map.An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.

View Article: PubMed Central - PubMed

Affiliation: Department of Advanced Information technology, Kyushu University, Fukuoka, Fukuoka, Japan.

ABSTRACT
This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.

No MeSH data available.