Limits...
Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?

Mazandu GK, Mulder NJ - PLoS ONE (2014)

Bottom Line: However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration.We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach.The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Group, Department of clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa; African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa, and Cape Coast, Ghana.

ABSTRACT
The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

Show MeSH
Performance evaluation in terms of Pearson's correlation values.These different Pearson's correlation values with Enzyme Commission (EC), Pfam and Sequence similarity are obtained from the CESSM online tool. For x-axis labels, the prefixes R, N, L, Li, S, X, A, Z, W, and U represent the approaches and stand for Resnik, Nunivers, Lin, Li, Relevance, XGraSM, Annotation-based, Zhang, Wang and GO-universal, respectively. The suffixes GIC, UIC and DIC represent SimGIC, SimUIC and SimDIC measures, respectively. In cases where the prefix X is used, it is immediately followed by the approach prefix. Refer to Table 2 and 3 for the description of these different measures.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256219&req=5

pone-0113859-g001: Performance evaluation in terms of Pearson's correlation values.These different Pearson's correlation values with Enzyme Commission (EC), Pfam and Sequence similarity are obtained from the CESSM online tool. For x-axis labels, the prefixes R, N, L, Li, S, X, A, Z, W, and U represent the approaches and stand for Resnik, Nunivers, Lin, Li, Relevance, XGraSM, Annotation-based, Zhang, Wang and GO-universal, respectively. The suffixes GIC, UIC and DIC represent SimGIC, SimUIC and SimDIC measures, respectively. In cases where the prefix X is used, it is immediately followed by the approach prefix. Refer to Table 2 and 3 for the description of these different measures.

Mentions: We used a dataset of proteins with known relationships downloaded from the CESSM online tool. The GO annotations of different proteins in the dataset were retrieved from the GOA-UniProtKB dataset. The CESSM tool has made the comparison of different functional similarity measures using Pearson's correlation measures with sequence, Pfam domain and EC similarity possible. We ran the CESSM online tool and results are shown in Figure 1 for the BP, MF and CC ontologies. Except for the Resnik approach, these results show that in general there is a good correlation between EC, Pfam domain, sequence similarity and functional similarity measures for BP, MF and CC, especially when using measures other than Max and Avg. For EC in particular, the MF ontology tends to display higher levels of correlation. This is unsurprising as EC numbers are very specific for a particular function, so there should be good correlation in MF terms.


Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?

Mazandu GK, Mulder NJ - PLoS ONE (2014)

Performance evaluation in terms of Pearson's correlation values.These different Pearson's correlation values with Enzyme Commission (EC), Pfam and Sequence similarity are obtained from the CESSM online tool. For x-axis labels, the prefixes R, N, L, Li, S, X, A, Z, W, and U represent the approaches and stand for Resnik, Nunivers, Lin, Li, Relevance, XGraSM, Annotation-based, Zhang, Wang and GO-universal, respectively. The suffixes GIC, UIC and DIC represent SimGIC, SimUIC and SimDIC measures, respectively. In cases where the prefix X is used, it is immediately followed by the approach prefix. Refer to Table 2 and 3 for the description of these different measures.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256219&req=5

pone-0113859-g001: Performance evaluation in terms of Pearson's correlation values.These different Pearson's correlation values with Enzyme Commission (EC), Pfam and Sequence similarity are obtained from the CESSM online tool. For x-axis labels, the prefixes R, N, L, Li, S, X, A, Z, W, and U represent the approaches and stand for Resnik, Nunivers, Lin, Li, Relevance, XGraSM, Annotation-based, Zhang, Wang and GO-universal, respectively. The suffixes GIC, UIC and DIC represent SimGIC, SimUIC and SimDIC measures, respectively. In cases where the prefix X is used, it is immediately followed by the approach prefix. Refer to Table 2 and 3 for the description of these different measures.
Mentions: We used a dataset of proteins with known relationships downloaded from the CESSM online tool. The GO annotations of different proteins in the dataset were retrieved from the GOA-UniProtKB dataset. The CESSM tool has made the comparison of different functional similarity measures using Pearson's correlation measures with sequence, Pfam domain and EC similarity possible. We ran the CESSM online tool and results are shown in Figure 1 for the BP, MF and CC ontologies. Except for the Resnik approach, these results show that in general there is a good correlation between EC, Pfam domain, sequence similarity and functional similarity measures for BP, MF and CC, especially when using measures other than Max and Avg. For EC in particular, the MF ontology tends to display higher levels of correlation. This is unsurprising as EC numbers are very specific for a particular function, so there should be good correlation in MF terms.

Bottom Line: However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration.We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach.The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Group, Department of clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Cape Town, South Africa; African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa, and Cape Coast, Ghana.

ABSTRACT
The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

Show MeSH