Limits...
Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks.

Wang Z, Cao R, Cheng J - BMC Bioinformatics (2013)

Bottom Line: Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era.However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function.These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of Missouri, Columbia, Missouri 65211, USA.

ABSTRACT
Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

Show MeSH
Precision and recall when progressively considering predictions with confidence score in ranges [0, 1], [0.3, 1], and [0.6, 1] for our predictor 1. The predictions in the three ranges were predicted by three different methods: PSI-BLAST search, HHSearch search, and Domain Co-Occurrence Networks. Their precision and recall curves were drawn in three different colors, showing the higher level of prediction gradually increased recall at the expense of lower precision.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3584933&req=5

Figure 3: Precision and recall when progressively considering predictions with confidence score in ranges [0, 1], [0.3, 1], and [0.6, 1] for our predictor 1. The predictions in the three ranges were predicted by three different methods: PSI-BLAST search, HHSearch search, and Domain Co-Occurrence Networks. Their precision and recall curves were drawn in three different colors, showing the higher level of prediction gradually increased recall at the expense of lower precision.

Mentions: In order to further analyze the amount of contributions made by profile-sequence alignment (PSI-BLAST), profile-profile alignment (HHSearch), and domain co-occurrence networks (DCN), we plotted a precision-recall curve of predictor 1 in Figure 3 to show how precision and recall changes, when progressively considering predictions resulted from PSI-BLAST search at level 1, from both PSI-BLAST and HHSearch searches at levels 1 and 2, and from all three methods at levels 1, 2 and 3. Figure 3 shows that the profile-profile alignment (HHSearch) extended the recall of profile-sequence alignment (PSI-BLAST) from 0.57 to 0.64, and DCN further increased the recall to 0.69. The results demonstrate that three levels of predictions are complementary and can be combined effectively to increase the sensitivity of protein function prediction. Particularly, the DCN method may contribute valuable function predictions when all homology search methods fail to find useful hits, even though the prediction precision in this ab initio situation may be low.


Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks.

Wang Z, Cao R, Cheng J - BMC Bioinformatics (2013)

Precision and recall when progressively considering predictions with confidence score in ranges [0, 1], [0.3, 1], and [0.6, 1] for our predictor 1. The predictions in the three ranges were predicted by three different methods: PSI-BLAST search, HHSearch search, and Domain Co-Occurrence Networks. Their precision and recall curves were drawn in three different colors, showing the higher level of prediction gradually increased recall at the expense of lower precision.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3584933&req=5

Figure 3: Precision and recall when progressively considering predictions with confidence score in ranges [0, 1], [0.3, 1], and [0.6, 1] for our predictor 1. The predictions in the three ranges were predicted by three different methods: PSI-BLAST search, HHSearch search, and Domain Co-Occurrence Networks. Their precision and recall curves were drawn in three different colors, showing the higher level of prediction gradually increased recall at the expense of lower precision.
Mentions: In order to further analyze the amount of contributions made by profile-sequence alignment (PSI-BLAST), profile-profile alignment (HHSearch), and domain co-occurrence networks (DCN), we plotted a precision-recall curve of predictor 1 in Figure 3 to show how precision and recall changes, when progressively considering predictions resulted from PSI-BLAST search at level 1, from both PSI-BLAST and HHSearch searches at levels 1 and 2, and from all three methods at levels 1, 2 and 3. Figure 3 shows that the profile-profile alignment (HHSearch) extended the recall of profile-sequence alignment (PSI-BLAST) from 0.57 to 0.64, and DCN further increased the recall to 0.69. The results demonstrate that three levels of predictions are complementary and can be combined effectively to increase the sensitivity of protein function prediction. Particularly, the DCN method may contribute valuable function predictions when all homology search methods fail to find useful hits, even though the prediction precision in this ab initio situation may be low.

Bottom Line: Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era.However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function.These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of Missouri, Columbia, Missouri 65211, USA.

ABSTRACT
Predicting protein function from sequence is useful for biochemical experiment design, mutagenesis analysis, protein engineering, protein design, biological pathway analysis, drug design, disease diagnosis, and genome annotation as a vast number of protein sequences with unknown function are routinely being generated by DNA, RNA and protein sequencing in the genomic era. However, despite significant progresses in the last several years, the accuracy of protein function prediction still needs to be improved in order to be used effectively in practice, particularly when little or no homology exists between a target protein and proteins with annotated function. Here, we developed a method that integrated profile-sequence alignment, profile-profile alignment, and Domain Co-Occurrence Networks (DCN) to predict protein function at different levels of complexity, ranging from obvious homology, to remote homology, to no homology. We tested the method blindingly in the 2011 Critical Assessment of Function Annotation (CAFA). Our experiments demonstrated that our three-level prediction method effectively increased the recall of function prediction while maintaining a reasonable precision. Particularly, our method can predict function terms defined by the Gene Ontology more accurately than three standard baseline methods in most situations, handle multi-domain proteins naturally, and make ab initio function prediction when no homology exists. These results show that our approach can combine complementary strengths of most widely used BLAST-based function prediction methods, rarely used in function prediction but more sensitive profile-profile comparison-based homology detection methods, and non-homology-based domain co-occurrence networks, to effectively extend the power of function prediction from high homology, to low homology, to no homology (ab initio cases).

Show MeSH