Limits...
Categorizing biomedicine images using novel image features and sparse coding representation.

Sheng J, Xu S, Luo X - BMC Med Genomics (2013)

Bottom Line: A serial of experimental results are obtained.Different features which include conventional image features and our proposed novel features indicate different categorizing performance, and the results are demonstrated.Compared with conventional image features that do not exploit characteristics regarding text positions and distributions inside images embedded in biomedical publications, our proposed image features coupled with the SR based representation model exhibit superior performance for classifying biomedical images as demonstrated in our comparative benchmark study.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Images embedded in biomedical publications carry rich information that often concisely summarize key hypotheses adopted, methods employed, or results obtained in a published study. Therefore, they offer valuable clues for understanding main content in a biomedical publication. Prior studies have pointed out the potential of mining images embedded in biomedical publications for automatically understanding and retrieving such images' associated source documents. Within the broad area of biomedical image processing, categorizing biomedical images is a fundamental step for building many advanced image analysis, retrieval, and mining applications. Similar to any automatic categorization effort, discriminative image features can provide the most crucial aid in the process.

Method: We observe that many images embedded in biomedical publications carry versatile annotation text. Based on the locations of and the spatial relationships between these text elements in an image, we thus propose some novel image features for image categorization purpose, which quantitatively characterize the spatial positions and distributions of text elements inside a biomedical image. We further adopt a sparse coding representation (SCR) based technique to categorize images embedded in biomedical publications by leveraging our newly proposed image features.

Results: we randomly selected 990 images of the JPG format for use in our experiments where 310 images were used as training samples and the rest were used as the testing cases. We first segmented 310 sample images following the our proposed procedure. This step produced a total of 1035 sub-images. We then manually labeled all these sub-images according to the two-level hierarchical image taxonomy proposed by 1. Among our annotation results, 316 are microscopy images, 126 are gel electrophoresis images, 135 are line charts, 156 are bar charts, 52 are spot charts, 25 are tables, 70 are flow charts, and the remaining 155 images are of the type "others". A serial of experimental results are obtained. Firstly, each image categorizing results is presented, and next image categorizing performance indexes such as precision, recall, F-score, are all listed. Different features which include conventional image features and our proposed novel features indicate different categorizing performance, and the results are demonstrated. Thirdly, we conduct an accuracy comparison between support vector machine classification method and our proposed sparse representation classification method. At last, our proposed approach is compared with three peer classification method and experimental results verify our impressively improved performance.

Conclusions: Compared with conventional image features that do not exploit characteristics regarding text positions and distributions inside images embedded in biomedical publications, our proposed image features coupled with the SR based representation model exhibit superior performance for classifying biomedical images as demonstrated in our comparative benchmark study.

Show MeSH

Related in: MedlinePlus

Eight examples of image classes used in this paper. Eight image classesand sub-classes in our image taxonomy, which are organized as a two-level classhierarchy. On the top level, images are categorized into the classes of flowcharts, experimental images, graph images, mix images, and others. On thebottom level, images are further divided into eight categories where the classof experimental images is categorized into microscopy and gel electrophoresisimages; the class of graph images into line charts, bar charts, spot charts,and tables.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4109834&req=5

Figure 3: Eight examples of image classes used in this paper. Eight image classesand sub-classes in our image taxonomy, which are organized as a two-level classhierarchy. On the top level, images are categorized into the classes of flowcharts, experimental images, graph images, mix images, and others. On thebottom level, images are further divided into eight categories where the classof experimental images is categorized into microscopy and gel electrophoresisimages; the class of graph images into line charts, bar charts, spot charts,and tables.

Mentions: It is easy to notice that many biomedical images contain some highly complex texturalpatterns or image background; in addition, visual objects displayed in a biomedicalimage can show low image contrast (see (a), (c), (d) in Figure 3 for examples). These visual characteristics of biomedical images rendermajor challenges for image content understanding and categorization using traditionalpixel, texture, or edge-based image features. Fortunately, as mentioned at thebeginning of this paper, biomedical images possess a salient content compositionproperty that distinguishes themselves from images in other application domains suchas personal photos taken by digital cameras--the majority of biomedical images carryabundant embedded text, which is introduced either for annotating other visualobjects in an image or as a primary source of content elements by itself. This imagecomposition characteristic suggests a new opportunity for understanding the contentof biomedical images--by quantitatively exploring the spatial distribution of textinformation inside a biomedical image, people may gain much high-level understandingof the image, such as the image's content type. To exploit this new type of imagefeatures for content characterization, we first need to detect the presence andlocations of text regions inside a biomedical image. In this work, we deploy thealgorithm by Xu et al. [8] for the purpose of image text region detection and localization. Based onthe spatial distribution of the detected text regions, we can then extract theaforementioned novel image features for categorizing biomedical images.


Categorizing biomedicine images using novel image features and sparse coding representation.

Sheng J, Xu S, Luo X - BMC Med Genomics (2013)

Eight examples of image classes used in this paper. Eight image classesand sub-classes in our image taxonomy, which are organized as a two-level classhierarchy. On the top level, images are categorized into the classes of flowcharts, experimental images, graph images, mix images, and others. On thebottom level, images are further divided into eight categories where the classof experimental images is categorized into microscopy and gel electrophoresisimages; the class of graph images into line charts, bar charts, spot charts,and tables.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4109834&req=5

Figure 3: Eight examples of image classes used in this paper. Eight image classesand sub-classes in our image taxonomy, which are organized as a two-level classhierarchy. On the top level, images are categorized into the classes of flowcharts, experimental images, graph images, mix images, and others. On thebottom level, images are further divided into eight categories where the classof experimental images is categorized into microscopy and gel electrophoresisimages; the class of graph images into line charts, bar charts, spot charts,and tables.
Mentions: It is easy to notice that many biomedical images contain some highly complex texturalpatterns or image background; in addition, visual objects displayed in a biomedicalimage can show low image contrast (see (a), (c), (d) in Figure 3 for examples). These visual characteristics of biomedical images rendermajor challenges for image content understanding and categorization using traditionalpixel, texture, or edge-based image features. Fortunately, as mentioned at thebeginning of this paper, biomedical images possess a salient content compositionproperty that distinguishes themselves from images in other application domains suchas personal photos taken by digital cameras--the majority of biomedical images carryabundant embedded text, which is introduced either for annotating other visualobjects in an image or as a primary source of content elements by itself. This imagecomposition characteristic suggests a new opportunity for understanding the contentof biomedical images--by quantitatively exploring the spatial distribution of textinformation inside a biomedical image, people may gain much high-level understandingof the image, such as the image's content type. To exploit this new type of imagefeatures for content characterization, we first need to detect the presence andlocations of text regions inside a biomedical image. In this work, we deploy thealgorithm by Xu et al. [8] for the purpose of image text region detection and localization. Based onthe spatial distribution of the detected text regions, we can then extract theaforementioned novel image features for categorizing biomedical images.

Bottom Line: A serial of experimental results are obtained.Different features which include conventional image features and our proposed novel features indicate different categorizing performance, and the results are demonstrated.Compared with conventional image features that do not exploit characteristics regarding text positions and distributions inside images embedded in biomedical publications, our proposed image features coupled with the SR based representation model exhibit superior performance for classifying biomedical images as demonstrated in our comparative benchmark study.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Images embedded in biomedical publications carry rich information that often concisely summarize key hypotheses adopted, methods employed, or results obtained in a published study. Therefore, they offer valuable clues for understanding main content in a biomedical publication. Prior studies have pointed out the potential of mining images embedded in biomedical publications for automatically understanding and retrieving such images' associated source documents. Within the broad area of biomedical image processing, categorizing biomedical images is a fundamental step for building many advanced image analysis, retrieval, and mining applications. Similar to any automatic categorization effort, discriminative image features can provide the most crucial aid in the process.

Method: We observe that many images embedded in biomedical publications carry versatile annotation text. Based on the locations of and the spatial relationships between these text elements in an image, we thus propose some novel image features for image categorization purpose, which quantitatively characterize the spatial positions and distributions of text elements inside a biomedical image. We further adopt a sparse coding representation (SCR) based technique to categorize images embedded in biomedical publications by leveraging our newly proposed image features.

Results: we randomly selected 990 images of the JPG format for use in our experiments where 310 images were used as training samples and the rest were used as the testing cases. We first segmented 310 sample images following the our proposed procedure. This step produced a total of 1035 sub-images. We then manually labeled all these sub-images according to the two-level hierarchical image taxonomy proposed by 1. Among our annotation results, 316 are microscopy images, 126 are gel electrophoresis images, 135 are line charts, 156 are bar charts, 52 are spot charts, 25 are tables, 70 are flow charts, and the remaining 155 images are of the type "others". A serial of experimental results are obtained. Firstly, each image categorizing results is presented, and next image categorizing performance indexes such as precision, recall, F-score, are all listed. Different features which include conventional image features and our proposed novel features indicate different categorizing performance, and the results are demonstrated. Thirdly, we conduct an accuracy comparison between support vector machine classification method and our proposed sparse representation classification method. At last, our proposed approach is compared with three peer classification method and experimental results verify our impressively improved performance.

Conclusions: Compared with conventional image features that do not exploit characteristics regarding text positions and distributions inside images embedded in biomedical publications, our proposed image features coupled with the SR based representation model exhibit superior performance for classifying biomedical images as demonstrated in our comparative benchmark study.

Show MeSH
Related in: MedlinePlus