Limits...
A relation based measure of semantic similarity for Gene Ontology annotations.

Sheehan B, Quigley A, Gaudin B, Dobson S - BMC Bioinformatics (2008)

Bottom Line: These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other.The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy.As a result our measure better describes the information contained in annotations associated with gene products and as a result is better suited to characterizing and classifying gene products through their annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Systems Research Group, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland. brendan.sheehan@ucd.ie

ABSTRACT

Background: Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO) have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products) associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description.Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other.

Results: We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy.

Conclusion: We derive a measure of semantic similarity between annotations that exploits all available information without introducing assumptions about the nature of the ontology or data. We preserve the principles underlying instance based methods of semantic similarity of terms at the annotation level. As a result our measure better describes the information contained in annotations associated with gene products and as a result is better suited to characterizing and classifying gene products through their annotations.

Show MeSH
Standard Deviation of Pathway Similarity Values of Annotations Consisting only of Molecular Function Terms Using SSAResnik. Standard deviation of SSAResnik similarity values of gene products inside and outside a pathway.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2655092&req=5

Figure 19: Standard Deviation of Pathway Similarity Values of Annotations Consisting only of Molecular Function Terms Using SSAResnik. Standard deviation of SSAResnik similarity values of gene products inside and outside a pathway.

Mentions: Figure 3 show the results of a comparison of SSAResnik with Wang's method and M axResnik on measuring the average annotation similarity, using all terms, of gene products inside and outside a pathway [data for figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 is found in Additional file 1]. The first 35 pathways are insufficiently annotated to produce meaningful results. Similarity values for SSAResnik and MaxResnik were normalized to allow for direct comparison between similarity values. All measures behave similarly, the similarity values returned by Wang's method tends to increase as values returned by SSAResnik increase. All measures tend to settle to an average similarity value when genes inside and outside a pathway are compared. Wang's method returns a higher value on average with values ranging between 0.5 and 0.6 as internal gene similarity increases. SSAResnik and MaxResnik returns values between 0.3 and 0.4 for the average similarity value of genes inside a pathway with genes outside a pathway as similarity of genes within a pathway increases. If pathways are identified by the difference between the average similarity of gene products inside and outside a cluster then SSAResnik and MaxResnik have greater discriminatory power. SSAResnik and MaxResnik behave identically for most pathways when all terms are considered.


A relation based measure of semantic similarity for Gene Ontology annotations.

Sheehan B, Quigley A, Gaudin B, Dobson S - BMC Bioinformatics (2008)

Standard Deviation of Pathway Similarity Values of Annotations Consisting only of Molecular Function Terms Using SSAResnik. Standard deviation of SSAResnik similarity values of gene products inside and outside a pathway.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2655092&req=5

Figure 19: Standard Deviation of Pathway Similarity Values of Annotations Consisting only of Molecular Function Terms Using SSAResnik. Standard deviation of SSAResnik similarity values of gene products inside and outside a pathway.
Mentions: Figure 3 show the results of a comparison of SSAResnik with Wang's method and M axResnik on measuring the average annotation similarity, using all terms, of gene products inside and outside a pathway [data for figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 is found in Additional file 1]. The first 35 pathways are insufficiently annotated to produce meaningful results. Similarity values for SSAResnik and MaxResnik were normalized to allow for direct comparison between similarity values. All measures behave similarly, the similarity values returned by Wang's method tends to increase as values returned by SSAResnik increase. All measures tend to settle to an average similarity value when genes inside and outside a pathway are compared. Wang's method returns a higher value on average with values ranging between 0.5 and 0.6 as internal gene similarity increases. SSAResnik and MaxResnik returns values between 0.3 and 0.4 for the average similarity value of genes inside a pathway with genes outside a pathway as similarity of genes within a pathway increases. If pathways are identified by the difference between the average similarity of gene products inside and outside a cluster then SSAResnik and MaxResnik have greater discriminatory power. SSAResnik and MaxResnik behave identically for most pathways when all terms are considered.

Bottom Line: These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other.The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy.As a result our measure better describes the information contained in annotations associated with gene products and as a result is better suited to characterizing and classifying gene products through their annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Systems Research Group, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland. brendan.sheehan@ucd.ie

ABSTRACT

Background: Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO) have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products) associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description.Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other.

Results: We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy.

Conclusion: We derive a measure of semantic similarity between annotations that exploits all available information without introducing assumptions about the nature of the ontology or data. We preserve the principles underlying instance based methods of semantic similarity of terms at the annotation level. As a result our measure better describes the information contained in annotations associated with gene products and as a result is better suited to characterizing and classifying gene products through their annotations.

Show MeSH