Limits...
Detecting small plant peptides using SPADA (Small Peptide Alignment Discovery Application).

Zhou P, Silverstein KA, Gao L, Walton JD, Nallu S, Guhlin J, Young ND - BMC Bioinformatics (2013)

Bottom Line: SPADA is able to incorporate information from profile alignments into the model prediction process and makes use of it to score different candidate models.SPADA achieves high sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families.A systematic application of SPADA to other classes of small peptides by research communities will greatly improve the genome annotation of different protein families in public genome databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Pathology, University of Minnesota, St, Paul, Minnesota 55108, USA. neviny@umn.edu.

ABSTRACT

Background: Small peptides encoded as one- or two-exon genes in plants have recently been shown to affect multiple aspects of plant development, reproduction and defense responses. However, popular similarity search tools and gene prediction techniques generally fail to identify most members belonging to this class of genes. This is largely due to the high sequence divergence among family members and the limited availability of experimentally verified small peptides to use as training sets for homology search and ab initio prediction. Consequently, there is an urgent need for both experimental and computational studies in order to further advance the accurate prediction of small peptides.

Results: We present here a homology-based gene prediction program to accurately predict small peptides at the genome level. Given a high-quality profile alignment, SPADA identifies and annotates nearly all family members in tested genomes with better performance than all general-purpose gene prediction programs surveyed. We find numerous mis-annotations in the current Arabidopsis thaliana and Medicago truncatula genome databases using SPADA, most of which have RNA-Seq expression support. We also show that SPADA works well on other classes of small secreted peptides in plants (e.g., self-incompatibility protein homologues) as well as non-secreted peptides outside the plant kingdom (e.g., the alpha-amanitin toxin gene family in the mushroom, Amanita bisporigera).

Conclusions: SPADA is a free software tool that accurately identifies and predicts the gene structure for short peptides with one or two exons. SPADA is able to incorporate information from profile alignments into the model prediction process and makes use of it to score different candidate models. SPADA achieves high sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families. A systematic application of SPADA to other classes of small peptides by research communities will greatly improve the genome annotation of different protein families in public genome databases.

Show MeSH
Performance comparison of different gene prediction components. Search E-value threshold is set to 0.001 by default.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3924332&req=5

Figure 2: Performance comparison of different gene prediction components. Search E-value threshold is set to 0.001 by default.

Mentions: We then compared the performance of SPADA running different model prediction components: GeneID, Augustus ("de novo" mode as well as "evidence" mode), GlimerHMM, GeneMark, GeneWise+SplicePredictor as well as "SPADA" (combination of "Augustus_evidence" and "GeneWise+SplicePredictor") and "All" (combination of all 6 individual components) (Figure2, Additional file4: Figure S1). The high specificities observed in all components are likely due to the model evaluation and selection step, where most false models are filtered. Prediction sensitivities, on the other hand, show substantial differences among components. In both genomes tested, "Augustus_evidence" and "GeneWise+SplicePredictor" gave the highest sensitivities among the six individual components. The default SPADA pipeline (denoted as "SPADA" in the figure) runs these two components and achieved even higher sensitivity. On the other hand, running all six individual components (denoted as "All" in the figure) gives the highest sensitivity, suggesting that search accuracy can still be improved by including more heterogeneous prediction programs in the pipeline. However, the gain in sensitivity offered by running all six components is marginal compared to running just two of them ("Augustus_evidence" and "GeneWise+SplicePredictor"), suggesting that a plateau in search accuracy could soon be reached and adding more prediction programs in the pipeline may not help much.


Detecting small plant peptides using SPADA (Small Peptide Alignment Discovery Application).

Zhou P, Silverstein KA, Gao L, Walton JD, Nallu S, Guhlin J, Young ND - BMC Bioinformatics (2013)

Performance comparison of different gene prediction components. Search E-value threshold is set to 0.001 by default.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3924332&req=5

Figure 2: Performance comparison of different gene prediction components. Search E-value threshold is set to 0.001 by default.
Mentions: We then compared the performance of SPADA running different model prediction components: GeneID, Augustus ("de novo" mode as well as "evidence" mode), GlimerHMM, GeneMark, GeneWise+SplicePredictor as well as "SPADA" (combination of "Augustus_evidence" and "GeneWise+SplicePredictor") and "All" (combination of all 6 individual components) (Figure2, Additional file4: Figure S1). The high specificities observed in all components are likely due to the model evaluation and selection step, where most false models are filtered. Prediction sensitivities, on the other hand, show substantial differences among components. In both genomes tested, "Augustus_evidence" and "GeneWise+SplicePredictor" gave the highest sensitivities among the six individual components. The default SPADA pipeline (denoted as "SPADA" in the figure) runs these two components and achieved even higher sensitivity. On the other hand, running all six individual components (denoted as "All" in the figure) gives the highest sensitivity, suggesting that search accuracy can still be improved by including more heterogeneous prediction programs in the pipeline. However, the gain in sensitivity offered by running all six components is marginal compared to running just two of them ("Augustus_evidence" and "GeneWise+SplicePredictor"), suggesting that a plateau in search accuracy could soon be reached and adding more prediction programs in the pipeline may not help much.

Bottom Line: SPADA is able to incorporate information from profile alignments into the model prediction process and makes use of it to score different candidate models.SPADA achieves high sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families.A systematic application of SPADA to other classes of small peptides by research communities will greatly improve the genome annotation of different protein families in public genome databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Pathology, University of Minnesota, St, Paul, Minnesota 55108, USA. neviny@umn.edu.

ABSTRACT

Background: Small peptides encoded as one- or two-exon genes in plants have recently been shown to affect multiple aspects of plant development, reproduction and defense responses. However, popular similarity search tools and gene prediction techniques generally fail to identify most members belonging to this class of genes. This is largely due to the high sequence divergence among family members and the limited availability of experimentally verified small peptides to use as training sets for homology search and ab initio prediction. Consequently, there is an urgent need for both experimental and computational studies in order to further advance the accurate prediction of small peptides.

Results: We present here a homology-based gene prediction program to accurately predict small peptides at the genome level. Given a high-quality profile alignment, SPADA identifies and annotates nearly all family members in tested genomes with better performance than all general-purpose gene prediction programs surveyed. We find numerous mis-annotations in the current Arabidopsis thaliana and Medicago truncatula genome databases using SPADA, most of which have RNA-Seq expression support. We also show that SPADA works well on other classes of small secreted peptides in plants (e.g., self-incompatibility protein homologues) as well as non-secreted peptides outside the plant kingdom (e.g., the alpha-amanitin toxin gene family in the mushroom, Amanita bisporigera).

Conclusions: SPADA is a free software tool that accurately identifies and predicts the gene structure for short peptides with one or two exons. SPADA is able to incorporate information from profile alignments into the model prediction process and makes use of it to score different candidate models. SPADA achieves high sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families. A systematic application of SPADA to other classes of small peptides by research communities will greatly improve the genome annotation of different protein families in public genome databases.

Show MeSH