Limits...
Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

Rashid M, Saha S, Raghava GP - BMC Bioinformatics (2007)

Bottom Line: Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.This method also predicts very important class of proteins that is membrane-attached proteins.This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Centre, Institute of Microbial Technology, Sector-39A, Chandigarh, India. mamoon@imtech.res.in

ABSTRACT

Background: In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.

Results: The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.

Conclusion: A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.

Show MeSH
A plot between reliability index (RI) and percent coverage vs average accuracy for PSSM based SVM module, where Y-axis shows average accuracy and X-axis shows RI (lower axis) and percent coverage (upper axis). For example, about 62% of sequences having RI > = 3 are predicted with 95% accuracy.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147037&req=5

Figure 1: A plot between reliability index (RI) and percent coverage vs average accuracy for PSSM based SVM module, where Y-axis shows average accuracy and X-axis shows RI (lower axis) and percent coverage (upper axis). For example, about 62% of sequences having RI > = 3 are predicted with 95% accuracy.

Mentions: In order to provide confidence in prediction, we computed reliability index (RI). It is a measure of level of certainty in a prediction. Figure 1 shows the average prediction accuracy with reliability index greater than or equal to a given value n where n = 1, 2, 3, 4 and 5. About 62% of the sequences with RI > = 3 are predicted with 95% accuracy by our PSSM based SVM module. The RI plots of amino acid composition and dipeptides composition based SVM modules are available in Additional File 1, Figure S1 and Figure S2 respectively.


Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

Rashid M, Saha S, Raghava GP - BMC Bioinformatics (2007)

A plot between reliability index (RI) and percent coverage vs average accuracy for PSSM based SVM module, where Y-axis shows average accuracy and X-axis shows RI (lower axis) and percent coverage (upper axis). For example, about 62% of sequences having RI > = 3 are predicted with 95% accuracy.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147037&req=5

Figure 1: A plot between reliability index (RI) and percent coverage vs average accuracy for PSSM based SVM module, where Y-axis shows average accuracy and X-axis shows RI (lower axis) and percent coverage (upper axis). For example, about 62% of sequences having RI > = 3 are predicted with 95% accuracy.
Mentions: In order to provide confidence in prediction, we computed reliability index (RI). It is a measure of level of certainty in a prediction. Figure 1 shows the average prediction accuracy with reliability index greater than or equal to a given value n where n = 1, 2, 3, 4 and 5. About 62% of the sequences with RI > = 3 are predicted with 95% accuracy by our PSSM based SVM module. The RI plots of amino acid composition and dipeptides composition based SVM modules are available in Additional File 1, Figure S1 and Figure S2 respectively.

Bottom Line: Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.This method also predicts very important class of proteins that is membrane-attached proteins.This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Centre, Institute of Microbial Technology, Sector-39A, Chandigarh, India. mamoon@imtech.res.in

ABSTRACT

Background: In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.

Results: The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.

Conclusion: A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.

Show MeSH