Limits...
An ant colony optimization based feature selection for web page classification.

Saraç E, Özel SA - ScientificWorldJournal (2014)

Bottom Line: In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages.We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification.We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering, Çukurova University, Balcali, Sarıçam, 01330 Adana, Turkey.

ABSTRACT
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

Show MeSH

Related in: MedlinePlus

Distribution of the ACO selected tags for project class.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4127204&req=5

fig4: Distribution of the ACO selected tags for project class.

Mentions: In this section, we have investigated the distribution of tags in the ACO selected subset of features from tagged terms method for each class. Figures 3, 4, 5, 6, and 7 show tag distributions for the ACO selected features for these five classes. Since URLs are very dominant features for the WebKB dataset as it can be seen from Table 4, we did not include features from URLs in this experiment.


An ant colony optimization based feature selection for web page classification.

Saraç E, Özel SA - ScientificWorldJournal (2014)

Distribution of the ACO selected tags for project class.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4127204&req=5

fig4: Distribution of the ACO selected tags for project class.
Mentions: In this section, we have investigated the distribution of tags in the ACO selected subset of features from tagged terms method for each class. Figures 3, 4, 5, 6, and 7 show tag distributions for the ACO selected features for these five classes. Since URLs are very dominant features for the WebKB dataset as it can be seen from Table 4, we did not include features from URLs in this experiment.

Bottom Line: In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages.We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification.We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering, Çukurova University, Balcali, Sarıçam, 01330 Adana, Turkey.

ABSTRACT
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

Show MeSH
Related in: MedlinePlus