Limits...
An integrative approach for measuring semantic similarities using gene ontology.

Peng J, Li H, Jiang Q, Wang Y, Chen J - BMC Syst Biol (2014)

Bottom Line: Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications.The experiment results show that InteGO2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories.InteGO2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications. However, the existing GO based similarity measurements have limited functions for only a subset of GO information is considered in each measure. An appropriate integration of the existing measures to take into account more information in GO is demanding.

Results: We propose a novel integrative measure called InteGO2 to automatically select appropriate seed measures and then to integrate them using a metaheuristic search method. The experiment results show that InteGO2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories.

Conclusions: InteGO2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness. The supplementary document and software are available at http://mlg.hit.edu.cn:8082/.

Show MeSH
Illustrative example of three types of seed measure group. m1, m2, m3,...,m8 are eight candidate measures. The values on the number axis are their RankSim values. (a), (b) and (c) are illustration examples of high, low and mix seed measure groups respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4305987&req=5

Figure 2: Illustrative example of three types of seed measure group. m1, m2, m3,...,m8 are eight candidate measures. The values on the number axis are their RankSim values. (a), (b) and (c) are illustration examples of high, low and mix seed measure groups respectively.

Mentions: In this paper, we present a solution to this problem based on only one principle that the final ranked score should be the score that all the seed measures agree. To this end, a grouping algorithm to select the most appropriate seed measures for each gene pair is proposed as follows. Let RankSim(g1, g2, m1), RankSim(g1, g2, m2), …, RankSim(g1, g2, mn) be the ranked similarity scores of n candidate measures for g1 and g2, and mx ∈ Sall. By putting them on a number axis, we group all the candidate measures agglomeratively based on their distances on the axis, forming a dendrogram D(g1g2). And then we gradually reduce the distance threshold d in D(g1g2) to iteratively find the isolated measures and remove them until a core group of measures is leftover - which is called the seed measure group (see examples in Figure 2). Mathematically, a seed measure group is the largest group with at least c measures, where c is a pre-defined value (c = 3 in our settings; more detail about the choice of c is shown in Additional file 1). And the distance between genes in the seed measure group is not larger than , where is a pre-defined value (in our settings; more detail about the choice of is shown in Additional file 2).For g1g2, only the measures in the seed measure group are considered as seed measures, saved in Sseed.


An integrative approach for measuring semantic similarities using gene ontology.

Peng J, Li H, Jiang Q, Wang Y, Chen J - BMC Syst Biol (2014)

Illustrative example of three types of seed measure group. m1, m2, m3,...,m8 are eight candidate measures. The values on the number axis are their RankSim values. (a), (b) and (c) are illustration examples of high, low and mix seed measure groups respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4305987&req=5

Figure 2: Illustrative example of three types of seed measure group. m1, m2, m3,...,m8 are eight candidate measures. The values on the number axis are their RankSim values. (a), (b) and (c) are illustration examples of high, low and mix seed measure groups respectively.
Mentions: In this paper, we present a solution to this problem based on only one principle that the final ranked score should be the score that all the seed measures agree. To this end, a grouping algorithm to select the most appropriate seed measures for each gene pair is proposed as follows. Let RankSim(g1, g2, m1), RankSim(g1, g2, m2), …, RankSim(g1, g2, mn) be the ranked similarity scores of n candidate measures for g1 and g2, and mx ∈ Sall. By putting them on a number axis, we group all the candidate measures agglomeratively based on their distances on the axis, forming a dendrogram D(g1g2). And then we gradually reduce the distance threshold d in D(g1g2) to iteratively find the isolated measures and remove them until a core group of measures is leftover - which is called the seed measure group (see examples in Figure 2). Mathematically, a seed measure group is the largest group with at least c measures, where c is a pre-defined value (c = 3 in our settings; more detail about the choice of c is shown in Additional file 1). And the distance between genes in the seed measure group is not larger than , where is a pre-defined value (in our settings; more detail about the choice of is shown in Additional file 2).For g1g2, only the measures in the seed measure group are considered as seed measures, saved in Sseed.

Bottom Line: Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications.The experiment results show that InteGO2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories.InteGO2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Gene Ontology (GO) provides rich information and a convenient way to study gene functional similarity, which has been successfully used in various applications. However, the existing GO based similarity measurements have limited functions for only a subset of GO information is considered in each measure. An appropriate integration of the existing measures to take into account more information in GO is demanding.

Results: We propose a novel integrative measure called InteGO2 to automatically select appropriate seed measures and then to integrate them using a metaheuristic search method. The experiment results show that InteGO2 significantly improves the performance of gene similarity in human, Arabidopsis and yeast on both molecular function and biological process GO categories.

Conclusions: InteGO2 computes gene-to-gene similarities more accurately than tested existing measures and has high robustness. The supplementary document and software are available at http://mlg.hit.edu.cn:8082/.

Show MeSH