Limits...
Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.

Hassan H, Badr A, Abdelhalim MB - Bioinform Biol Insights (2015)

Bottom Line: However, a need to get even better prediction tools remains.Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors.In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, College of Computing and Information Technology, Arab Academy for Science and Technology and Maritime Transport (AASTMT), Cairo, Egypt.

ABSTRACT
O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

No MeSH data available.


GA-based PSO optimization.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4494626&req=5

f3-bbi-9-2015-103: GA-based PSO optimization.

Mentions: In our previous study,10 we used PSO for undersampling the O-glycosylation sites dataset by selecting the most important samples from the majority class. Proper setting for the PSO algorithm parameters can improve the algorithm performance, and consequently, the classification accuracy. In this paper, we use the GA for optimizing the parameters of the PSO. As shown in Figure 3, the PSO-based undersampling technique (the dashed part) is integrated with the GA. Each GA solution (chromosome) represents different alternatives for the values of the PSO parameters to be used in the sample selection from the majority class step. GA obtains the optimized PSO parameters after a series of iterative GA operations (crossover and mutation). Based on the balanced dataset, the classification accuracy using the RF classifier is used for evaluating the fitness of each GA solution. The GA terminates after exceeding the maximum number of generations.


Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.

Hassan H, Badr A, Abdelhalim MB - Bioinform Biol Insights (2015)

GA-based PSO optimization.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4494626&req=5

f3-bbi-9-2015-103: GA-based PSO optimization.
Mentions: In our previous study,10 we used PSO for undersampling the O-glycosylation sites dataset by selecting the most important samples from the majority class. Proper setting for the PSO algorithm parameters can improve the algorithm performance, and consequently, the classification accuracy. In this paper, we use the GA for optimizing the parameters of the PSO. As shown in Figure 3, the PSO-based undersampling technique (the dashed part) is integrated with the GA. Each GA solution (chromosome) represents different alternatives for the values of the PSO parameters to be used in the sample selection from the majority class step. GA obtains the optimized PSO parameters after a series of iterative GA operations (crossover and mutation). Based on the balanced dataset, the classification accuracy using the RF classifier is used for evaluating the fitness of each GA solution. The GA terminates after exceeding the maximum number of generations.

Bottom Line: However, a need to get even better prediction tools remains.Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors.In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, College of Computing and Information Technology, Arab Academy for Science and Technology and Maritime Transport (AASTMT), Cairo, Egypt.

ABSTRACT
O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

No MeSH data available.