Limits...
In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

Chitale M, Khan IK, Kihara D - BMC Bioinformatics (2013)

Bottom Line: The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011.Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST.Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Purdue University, 305 N, University Street, West Lafayette, Indiana 47907, USA.

ABSTRACT

Background: Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA.

Results: We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST.

Conclusion: The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

Show MeSH

Related in: MedlinePlus

Performance of PFP (confidence score), PFP prediction sorted by the raw score (PFP_RAW), ESG, PRIOR, BLAST, and GOtcha. A, Precision - Recall plot for the BP domain. B, ROC for the BP domain. C, Precision - Recall plot for the MF domain. D, ROC for the MF domain.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3584938&req=5

Figure 1: Performance of PFP (confidence score), PFP prediction sorted by the raw score (PFP_RAW), ESG, PRIOR, BLAST, and GOtcha. A, Precision - Recall plot for the BP domain. B, ROC for the BP domain. C, Precision - Recall plot for the MF domain. D, ROC for the MF domain.

Mentions: Figure 1 shows the precision-recall curve and the ROC of PFP with raw score compared with the other methods. For the BP domain, we observe that PFP with raw score (PFP_RAW in the plots) has slightly higher precision for a given recall value than PFP predictions ranked by the confidence score (PFP). PFP with raw score has clearly better performance than PFP with confidence score in the ROC curve (Figure 1B), particularly at lower false positive range (x-axis). The similar behavior of PFP raw score is observed for predictions in the MF domain (Figure 1C &1D). These results indicate that the confidence score of PFP, which is computed in two steps from the raw score via the p-score distribution (see Methods), was not very successful in ranking predicted GO terms especially at top ranks (lower false positive regions). Thus, derivation of the confidence score need s to be reexamined and probably revised.


In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

Chitale M, Khan IK, Kihara D - BMC Bioinformatics (2013)

Performance of PFP (confidence score), PFP prediction sorted by the raw score (PFP_RAW), ESG, PRIOR, BLAST, and GOtcha. A, Precision - Recall plot for the BP domain. B, ROC for the BP domain. C, Precision - Recall plot for the MF domain. D, ROC for the MF domain.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3584938&req=5

Figure 1: Performance of PFP (confidence score), PFP prediction sorted by the raw score (PFP_RAW), ESG, PRIOR, BLAST, and GOtcha. A, Precision - Recall plot for the BP domain. B, ROC for the BP domain. C, Precision - Recall plot for the MF domain. D, ROC for the MF domain.
Mentions: Figure 1 shows the precision-recall curve and the ROC of PFP with raw score compared with the other methods. For the BP domain, we observe that PFP with raw score (PFP_RAW in the plots) has slightly higher precision for a given recall value than PFP predictions ranked by the confidence score (PFP). PFP with raw score has clearly better performance than PFP with confidence score in the ROC curve (Figure 1B), particularly at lower false positive range (x-axis). The similar behavior of PFP raw score is observed for predictions in the MF domain (Figure 1C &1D). These results indicate that the confidence score of PFP, which is computed in two steps from the raw score via the p-score distribution (see Methods), was not very successful in ranking predicted GO terms especially at top ranks (lower false positive regions). Thus, derivation of the confidence score need s to be reexamined and probably revised.

Bottom Line: The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011.Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST.Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Purdue University, 305 N, University Street, West Lafayette, Indiana 47907, USA.

ABSTRACT

Background: Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA.

Results: We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST.

Conclusion: The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

Show MeSH
Related in: MedlinePlus