Limits...
VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens.

Garg A, Gupta D - BMC Bioinformatics (2008)

Bottom Line: The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier.The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Structural and Computational Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India. aarti@icgeb.res.in

ABSTRACT

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

Show MeSH

Related in: MedlinePlus

VirulentPred predictions for different proteomes. The plot depicts the number of proteins predicted to be virulent (at a higher threshold value, ≥1) in proteomes of 7 different bacteria.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2254373&req=5

Figure 3: VirulentPred predictions for different proteomes. The plot depicts the number of proteins predicted to be virulent (at a higher threshold value, ≥1) in proteomes of 7 different bacteria.

Mentions: Using the higher threshold of SVM score, we started our search with the protein sequences of the smallest forms of the Monera kingdom-Mycoplasma genetalium, a parasitic bacterium colonizing in genital and respiratory tracts of primates. Mycoplasma genetalium is of special interest to the developmental biologists as it is the organism with the smallest genome, next only to that of viruses. Out of 485 protein sequences of Mycoplasma genetalium, VirulentPred was able to classify 295 sequences, (60.8% of the total proteome) as virulent on the basis of SVM predicted scores at the threshold value of 0.0. However, at a threshold ≥1.0, 29.5% of sequences were predicted as virulent. In addition, we also checked the performance of our method for proteomes of Chlamydia trachomatis (458 sequences), Rickettsia prowazekii (549 sequences), Helicobacter pylori (575 sequences), and Treponema pallidum (608 sequences). The prediction summary obtained for proteomes of the 5 pathogens are shown in Figure 3. Besides, we also tested VirulentPred method on the complete proteomes of two non-pathogenic bacteria such as Mycobacterium smegmatis (72) and Listeria innocua (402) to establish the reliability of VirulentPred method. The outputs show that the chances of false prediction are very less for the prediction of virulent proteins at higher threshold, hence increasing the reliability.


VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens.

Garg A, Gupta D - BMC Bioinformatics (2008)

VirulentPred predictions for different proteomes. The plot depicts the number of proteins predicted to be virulent (at a higher threshold value, ≥1) in proteomes of 7 different bacteria.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2254373&req=5

Figure 3: VirulentPred predictions for different proteomes. The plot depicts the number of proteins predicted to be virulent (at a higher threshold value, ≥1) in proteomes of 7 different bacteria.
Mentions: Using the higher threshold of SVM score, we started our search with the protein sequences of the smallest forms of the Monera kingdom-Mycoplasma genetalium, a parasitic bacterium colonizing in genital and respiratory tracts of primates. Mycoplasma genetalium is of special interest to the developmental biologists as it is the organism with the smallest genome, next only to that of viruses. Out of 485 protein sequences of Mycoplasma genetalium, VirulentPred was able to classify 295 sequences, (60.8% of the total proteome) as virulent on the basis of SVM predicted scores at the threshold value of 0.0. However, at a threshold ≥1.0, 29.5% of sequences were predicted as virulent. In addition, we also checked the performance of our method for proteomes of Chlamydia trachomatis (458 sequences), Rickettsia prowazekii (549 sequences), Helicobacter pylori (575 sequences), and Treponema pallidum (608 sequences). The prediction summary obtained for proteomes of the 5 pathogens are shown in Figure 3. Besides, we also tested VirulentPred method on the complete proteomes of two non-pathogenic bacteria such as Mycobacterium smegmatis (72) and Listeria innocua (402) to establish the reliability of VirulentPred method. The outputs show that the chances of false prediction are very less for the prediction of virulent proteins at higher threshold, hence increasing the reliability.

Bottom Line: The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier.The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Structural and Computational Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India. aarti@icgeb.res.in

ABSTRACT

Background: Prediction of bacterial virulent protein sequences has implications for identification and characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens.

Results: In the present study we propose a bacterial virulent protein prediction method based on bi-layer cascade Support Vector Machine (SVM). The first layer SVM classifiers were trained and optimized with different individual protein sequence features like amino acid composition, dipeptide composition (occurrences of the possible pairs of ith and i+1th amino acid residues), higher order dipeptide composition (pairs of ith and i+2nd residues) and Position Specific Iterated BLAST (PSI-BLAST) generated Position Specific Scoring Matrices (PSSM). In addition, a similarity-search based module was also developed using a dataset of virulent and non-virulent proteins as BLAST database. A five-fold cross-validation technique was used for the evaluation of various prediction strategies in this study. The results from the first layer (SVM scores and PSI-BLAST result) were cascaded to the second layer SVM classifier to train and generate the final classifier. The cascade SVM classifier was able to accomplish an accuracy of 81.8%, covering 86% area in the Receiver Operator Characteristic (ROC) plot, better than that of either of the layer one SVM classifiers based on single or multiple sequence features.

Conclusion: VirulentPred is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. VirulentPred is available as a freely accessible World Wide Web server - VirulentPred, at http://bioinfo.icgeb.res.in/virulent/.

Show MeSH
Related in: MedlinePlus