Limits...
VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens.

Garg A, Gupta D - BMC Bioinformatics (2008)

Bottom Line: The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier.The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Structural and Computational Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India. aarti@icgeb.res.in

ABSTRACT

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

Show MeSH

Related in: MedlinePlus

Conversion of PSSM into training vectors. The steps used to convert PSSM profiles generated by PSI-BLAST into a training vector of 400 dimensions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2254373&req=5

Figure 4: Conversion of PSSM into training vectors. The steps used to convert PSSM profiles generated by PSI-BLAST into a training vector of 400 dimensions.

Mentions: To make a SVM input of fixed length, we summed up all the rows in the PSSM corresponding to the same amino acid in the sequence, followed by division of each element by the length of the sequence. The steps used to generate an input of 400 dimensions are shown in Figure 4.


VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens.

Garg A, Gupta D - BMC Bioinformatics (2008)

Conversion of PSSM into training vectors. The steps used to convert PSSM profiles generated by PSI-BLAST into a training vector of 400 dimensions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2254373&req=5

Figure 4: Conversion of PSSM into training vectors. The steps used to convert PSSM profiles generated by PSI-BLAST into a training vector of 400 dimensions.
Mentions: To make a SVM input of fixed length, we summed up all the rows in the PSSM corresponding to the same amino acid in the sequence, followed by division of each element by the length of the sequence. The steps used to generate an input of 400 dimensions are shown in Figure 4.

Bottom Line: The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier.The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Structural and Computational Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India. aarti@icgeb.res.in

ABSTRACT

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

Show MeSH
Related in: MedlinePlus