Limits...
Predicting gene ontology functions from protein's regional surface structures.

Liu ZP, Wu LY, Wang Y, Chen L, Zhang XS - BMC Bioinformatics (2007)

Bottom Line: The proposed method revealed strong relationship between small surface patterns (or pockets) and GO functions, which can be further used to identify active sites or functional motifs.The high quality performance of the prediction method together with the statistics also indicates that pockets play essential roles in biological interactions or the GO functions.Moreover, in addition to pockets, the proposed network framework can also be used for adopting other protein spatial surface patterns to predict the protein functions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China. zhiping.liu@hotmail.com

ABSTRACT

Background: Annotation of protein functions is an important task in the post-genomic era. Most early approaches for this task exploit only the sequence or global structure information. However, protein surfaces are believed to be crucial to protein functions because they are the main interfaces to facilitate biological interactions. Recently, several databases related to structural surfaces, such as pockets and cavities, have been constructed with a comprehensive library of identified surface structures. For example, CASTp provides identification and measurements of surface accessible pockets as well as interior inaccessible cavities.

Results: A novel method was proposed to predict the Gene Ontology (GO) functions of proteins from the pocket similarity network, which is constructed according to the structure similarities of pockets. The statistics of the networks were presented to explore the relationship between the similar pockets and GO functions of proteins. Cross-validation experiments were conducted to evaluate the performance of the proposed method. Results and codes are available at: http://zhangroup.aporc.org/bioinfo/PSN/.

Conclusion: The computational results demonstrate that the proposed method based on the pocket similarity network is effective and efficient for predicting GO functions of proteins in terms of both computational complexity and prediction accuracy. The proposed method revealed strong relationship between small surface patterns (or pockets) and GO functions, which can be further used to identify active sites or functional motifs. The high quality performance of the prediction method together with the statistics also indicates that pockets play essential roles in biological interactions or the GO functions. Moreover, in addition to pockets, the proposed network framework can also be used for adopting other protein spatial surface patterns to predict the protein functions.

Show MeSH

Related in: MedlinePlus

The prediction results with selected GO terms. The recall-precision graphs of the prediction results with selected specific GO terms by the pocket similarity network (pvSOAR cRMSD p-value 10-2). (a) The RP curves of the prediction results when the informative GO terms are selected based on GO term probability. (b) The RP curves of the prediction results when the informative GO terms are selected based on the depth level. The red dash line gives the original prediction results without filter.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2233648&req=5

Figure 6: The prediction results with selected GO terms. The recall-precision graphs of the prediction results with selected specific GO terms by the pocket similarity network (pvSOAR cRMSD p-value 10-2). (a) The RP curves of the prediction results when the informative GO terms are selected based on GO term probability. (b) The RP curves of the prediction results when the informative GO terms are selected based on the depth level. The red dash line gives the original prediction results without filter.

Mentions: in which anno(c) is the number of proteins annotated with the term c, and children(c) is the set of child nodes of term c. The probability of the term c is then defined as p(c) = freq(c) = freq(root), where freq(root) is the frequency of the root term [36]. The probability is monotonically increasing when moving up on a path from a leaf to the root. A GO term with smaller probability should be more specific and informative. The probabilities of all GO terms occurring in the dataset are calculated. The mean values of the probabilities in three ontologies are 0.044 (C), 0.020 (P) and 0.015 (F) respectively. Then we discard GO terms with probabilities larger than a given value (from 0.05 to 0.005) in the prediction to reduce the possible bias caused by unspecific GO terms with high frequency. The recall-precision graphs of prediction results with selected GO terms are drawn in Figure 6(a). We also consider the depth levels of GO terms. The recall-precision graphs of prediction results with the GO terms under certain depth level are drawn in Figure 6(b). The results show that the prediction method is effective for specific GO terms and the results are not artificially biased.


Predicting gene ontology functions from protein's regional surface structures.

Liu ZP, Wu LY, Wang Y, Chen L, Zhang XS - BMC Bioinformatics (2007)

The prediction results with selected GO terms. The recall-precision graphs of the prediction results with selected specific GO terms by the pocket similarity network (pvSOAR cRMSD p-value 10-2). (a) The RP curves of the prediction results when the informative GO terms are selected based on GO term probability. (b) The RP curves of the prediction results when the informative GO terms are selected based on the depth level. The red dash line gives the original prediction results without filter.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2233648&req=5

Figure 6: The prediction results with selected GO terms. The recall-precision graphs of the prediction results with selected specific GO terms by the pocket similarity network (pvSOAR cRMSD p-value 10-2). (a) The RP curves of the prediction results when the informative GO terms are selected based on GO term probability. (b) The RP curves of the prediction results when the informative GO terms are selected based on the depth level. The red dash line gives the original prediction results without filter.
Mentions: in which anno(c) is the number of proteins annotated with the term c, and children(c) is the set of child nodes of term c. The probability of the term c is then defined as p(c) = freq(c) = freq(root), where freq(root) is the frequency of the root term [36]. The probability is monotonically increasing when moving up on a path from a leaf to the root. A GO term with smaller probability should be more specific and informative. The probabilities of all GO terms occurring in the dataset are calculated. The mean values of the probabilities in three ontologies are 0.044 (C), 0.020 (P) and 0.015 (F) respectively. Then we discard GO terms with probabilities larger than a given value (from 0.05 to 0.005) in the prediction to reduce the possible bias caused by unspecific GO terms with high frequency. The recall-precision graphs of prediction results with selected GO terms are drawn in Figure 6(a). We also consider the depth levels of GO terms. The recall-precision graphs of prediction results with the GO terms under certain depth level are drawn in Figure 6(b). The results show that the prediction method is effective for specific GO terms and the results are not artificially biased.

Bottom Line: The proposed method revealed strong relationship between small surface patterns (or pockets) and GO functions, which can be further used to identify active sites or functional motifs.The high quality performance of the prediction method together with the statistics also indicates that pockets play essential roles in biological interactions or the GO functions.Moreover, in addition to pockets, the proposed network framework can also be used for adopting other protein spatial surface patterns to predict the protein functions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China. zhiping.liu@hotmail.com

ABSTRACT

Background: Annotation of protein functions is an important task in the post-genomic era. Most early approaches for this task exploit only the sequence or global structure information. However, protein surfaces are believed to be crucial to protein functions because they are the main interfaces to facilitate biological interactions. Recently, several databases related to structural surfaces, such as pockets and cavities, have been constructed with a comprehensive library of identified surface structures. For example, CASTp provides identification and measurements of surface accessible pockets as well as interior inaccessible cavities.

Results: A novel method was proposed to predict the Gene Ontology (GO) functions of proteins from the pocket similarity network, which is constructed according to the structure similarities of pockets. The statistics of the networks were presented to explore the relationship between the similar pockets and GO functions of proteins. Cross-validation experiments were conducted to evaluate the performance of the proposed method. Results and codes are available at: http://zhangroup.aporc.org/bioinfo/PSN/.

Conclusion: The computational results demonstrate that the proposed method based on the pocket similarity network is effective and efficient for predicting GO functions of proteins in terms of both computational complexity and prediction accuracy. The proposed method revealed strong relationship between small surface patterns (or pockets) and GO functions, which can be further used to identify active sites or functional motifs. The high quality performance of the prediction method together with the statistics also indicates that pockets play essential roles in biological interactions or the GO functions. Moreover, in addition to pockets, the proposed network framework can also be used for adopting other protein spatial surface patterns to predict the protein functions.

Show MeSH
Related in: MedlinePlus