Limits...
VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens.

Garg A, Gupta D - BMC Bioinformatics (2008)

Bottom Line: The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier.The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Structural and Computational Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India. aarti@icgeb.res.in

ABSTRACT

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

Show MeSH

Related in: MedlinePlus

Schema of the bi-layer cascade SVM module. The SVM classifier was the most efficient classifier developed in the study.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2254373&req=5

Figure 5: Schema of the bi-layer cascade SVM module. The SVM classifier was the most efficient classifier developed in the study.

Mentions: Classification efficiency of machine learning techniques is diminished by noise in large and complex datasets. However, this problem may be overcome by the layered SVM [24] in certain cases. To explore the effectiveness of this strategy for the training dataset used in the study, we generated a bi-layered cascade SVM classifier. The first layer of the cascade SVM consists of classifiers based on individual protein features discussed earlier (Figure 5). The second layer was trained with the binary scores of the output generated by 5 best classifiers in the first layer. The second layer SVM was trained with a vector of 7 dimensions (1 for AAC, 1 for dipeptide composition, 1 for higher order dipeptide composition, 1 for PSSM and 3 for PSI-BLAST results). Hence, the second layer SVM learns from the first layer classifiers and PSI-BLAST results to generate a final cascade SVM classifier.


VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens.

Garg A, Gupta D - BMC Bioinformatics (2008)

Schema of the bi-layer cascade SVM module. The SVM classifier was the most efficient classifier developed in the study.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2254373&req=5

Figure 5: Schema of the bi-layer cascade SVM module. The SVM classifier was the most efficient classifier developed in the study.
Mentions: Classification efficiency of machine learning techniques is diminished by noise in large and complex datasets. However, this problem may be overcome by the layered SVM [24] in certain cases. To explore the effectiveness of this strategy for the training dataset used in the study, we generated a bi-layered cascade SVM classifier. The first layer of the cascade SVM consists of classifiers based on individual protein features discussed earlier (Figure 5). The second layer was trained with the binary scores of the output generated by 5 best classifiers in the first layer. The second layer SVM was trained with a vector of 7 dimensions (1 for AAC, 1 for dipeptide composition, 1 for higher order dipeptide composition, 1 for PSSM and 3 for PSI-BLAST results). Hence, the second layer SVM learns from the first layer classifiers and PSI-BLAST results to generate a final cascade SVM classifier.

Bottom Line: The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier.The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Structural and Computational Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India. aarti@icgeb.res.in

ABSTRACT

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

Show MeSH
Related in: MedlinePlus