Limits...
Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data.

Xu T, Du L, Zhou Y - BMC Bioinformatics (2008)

Bottom Line: Researchers interested in analysing the expression patterns of functionally related genes usually hope to improve the accuracy of their results beyond the boundaries of currently available experimental data.This study demonstrated the reliability of current approaches that elevate the similarity of GO terms to the similarity of proteins.Suggestions for further improvements in functional similarity analysis are also provided.

View Article: PubMed Central - HTML - PubMed

Affiliation: Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center, Shanghai, PR China. xutao@chgc.sh.cn

ABSTRACT

Background: Researchers interested in analysing the expression patterns of functionally related genes usually hope to improve the accuracy of their results beyond the boundaries of currently available experimental data. Gene ontology (GO) data provides a novel way to measure the functional relationship between gene products. Many approaches have been reported for calculating the similarities between two GO terms, known as semantic similarities. However, biologists are more interested in the relationship between gene products than in the scores linking the GO terms. To highlight the relationships among genes, recent studies have focused on functional similarities.

Results: In this study, we evaluated five functional similarity methods using both protein-protein interaction (PPI) and expression data of S. cerevisiae. The receiver operating characteristics (ROC) and correlation coefficient analysis of these methods showed that the maximum method outperformed the other methods. Statistical comparison of multiple- and single-term annotated proteins in biological process ontology indicated that genes with multiple GO terms may be more reliable for separating true positives from noise.

Conclusion: This study demonstrated the reliability of current approaches that elevate the similarity of GO terms to the similarity of proteins. Suggestions for further improvements in functional similarity analysis are also provided.

Show MeSH
ROC curves of PPI evaluations. ROC evaluations of functional similarity measures based on the S. cerevisiae PPI dataset derived from DIP are shown. The evaluation was done in (a) biological process (BP), (b) molecular function (MF), (c) cellular component (CC), and (d) ALL (root ontology). Since the Schlicker method requires all three ontologies, it is only suitable in ALL.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2612010&req=5

Figure 2: ROC curves of PPI evaluations. ROC evaluations of functional similarity measures based on the S. cerevisiae PPI dataset derived from DIP are shown. The evaluation was done in (a) biological process (BP), (b) molecular function (MF), (c) cellular component (CC), and (d) ALL (root ontology). Since the Schlicker method requires all three ontologies, it is only suitable in ALL.

Mentions: Unexpectedly, the Max method consistently showed the best performance in spite of the fact that the performances of all measures were barely distinguishable in MF ontology (Fig. 2). The AUC values in Table 1 provide more details. The Max and Schlicker methods were adequate in BP ontology and were followed by the Wang, Ave and Tao methods. Since the tested functional similarity measures would give different results only when the genes were annotated by multiple GO terms (refer to Fig. 1), the number of genes annotated by a single GO term was investigated. In contrast to 34.6% in BP and 45.1% in CC, the number of genes annotated by single GO identifiers in MF was as high as 74% (Table 2). Interestingly, the gene numbers were distributed differently in each ontology. There were slight variations in the gene numbers in BP ontology among the single, double, triple and higher annotations. CC and MF were likely to assign less annotation terms to genes. This bias was clearer in MF than in CC. Most of these single annotations belonged to particular GO catalogues. For example, there were 43% MF single annotations in 'catalytic activity' and 27% in 'binding', 57.6% CC single annotations in 'organelle' and 28.1% in 'macromolecular complex', and 58% BP single annotations in 'metabolic process' and 23.3% in 'localization'. These results imply that genes involved in catalytic and binding activities were mostly of the 'one gene for one activity' type. Most genes localized in organelles and macromolecular complexes have unique locations in the cell. Some genes in metabolic and localization processes are unique to the particular process.


Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data.

Xu T, Du L, Zhou Y - BMC Bioinformatics (2008)

ROC curves of PPI evaluations. ROC evaluations of functional similarity measures based on the S. cerevisiae PPI dataset derived from DIP are shown. The evaluation was done in (a) biological process (BP), (b) molecular function (MF), (c) cellular component (CC), and (d) ALL (root ontology). Since the Schlicker method requires all three ontologies, it is only suitable in ALL.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2612010&req=5

Figure 2: ROC curves of PPI evaluations. ROC evaluations of functional similarity measures based on the S. cerevisiae PPI dataset derived from DIP are shown. The evaluation was done in (a) biological process (BP), (b) molecular function (MF), (c) cellular component (CC), and (d) ALL (root ontology). Since the Schlicker method requires all three ontologies, it is only suitable in ALL.
Mentions: Unexpectedly, the Max method consistently showed the best performance in spite of the fact that the performances of all measures were barely distinguishable in MF ontology (Fig. 2). The AUC values in Table 1 provide more details. The Max and Schlicker methods were adequate in BP ontology and were followed by the Wang, Ave and Tao methods. Since the tested functional similarity measures would give different results only when the genes were annotated by multiple GO terms (refer to Fig. 1), the number of genes annotated by a single GO term was investigated. In contrast to 34.6% in BP and 45.1% in CC, the number of genes annotated by single GO identifiers in MF was as high as 74% (Table 2). Interestingly, the gene numbers were distributed differently in each ontology. There were slight variations in the gene numbers in BP ontology among the single, double, triple and higher annotations. CC and MF were likely to assign less annotation terms to genes. This bias was clearer in MF than in CC. Most of these single annotations belonged to particular GO catalogues. For example, there were 43% MF single annotations in 'catalytic activity' and 27% in 'binding', 57.6% CC single annotations in 'organelle' and 28.1% in 'macromolecular complex', and 58% BP single annotations in 'metabolic process' and 23.3% in 'localization'. These results imply that genes involved in catalytic and binding activities were mostly of the 'one gene for one activity' type. Most genes localized in organelles and macromolecular complexes have unique locations in the cell. Some genes in metabolic and localization processes are unique to the particular process.

Bottom Line: Researchers interested in analysing the expression patterns of functionally related genes usually hope to improve the accuracy of their results beyond the boundaries of currently available experimental data.This study demonstrated the reliability of current approaches that elevate the similarity of GO terms to the similarity of proteins.Suggestions for further improvements in functional similarity analysis are also provided.

View Article: PubMed Central - HTML - PubMed

Affiliation: Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center, Shanghai, PR China. xutao@chgc.sh.cn

ABSTRACT

Background: Researchers interested in analysing the expression patterns of functionally related genes usually hope to improve the accuracy of their results beyond the boundaries of currently available experimental data. Gene ontology (GO) data provides a novel way to measure the functional relationship between gene products. Many approaches have been reported for calculating the similarities between two GO terms, known as semantic similarities. However, biologists are more interested in the relationship between gene products than in the scores linking the GO terms. To highlight the relationships among genes, recent studies have focused on functional similarities.

Results: In this study, we evaluated five functional similarity methods using both protein-protein interaction (PPI) and expression data of S. cerevisiae. The receiver operating characteristics (ROC) and correlation coefficient analysis of these methods showed that the maximum method outperformed the other methods. Statistical comparison of multiple- and single-term annotated proteins in biological process ontology indicated that genes with multiple GO terms may be more reliable for separating true positives from noise.

Conclusion: This study demonstrated the reliability of current approaches that elevate the similarity of GO terms to the similarity of proteins. Suggestions for further improvements in functional similarity analysis are also provided.

Show MeSH