Limits...
Conceptualization of molecular findings by mining gene annotations.

Chen V, Lu X - BMC Proc (2013)

Bottom Line: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations.We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner.

Methods: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations.

Results: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.

Conclusions: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

No MeSH data available.


Related in: MedlinePlus

Conceptual Overview of Research. A-C. The ontological structure of the GO, protein annotations, and biomedical literature associated with genes were collected. D. The above information was combined to create an integrated graph (GOGenePubmed) that reflects the relationship among genes, their annotations, and the semantic relationships between GO terms. E. Based on this graph, statistical schemes were designed and simulation experiments were performed to establish statistical models for assessing the functional coherence of gene sets. F-G. When provided with a gene list from experiment (F), the program can be used to search for coherent subsets among the list (G).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4042834&req=5

Figure 1: Conceptual Overview of Research. A-C. The ontological structure of the GO, protein annotations, and biomedical literature associated with genes were collected. D. The above information was combined to create an integrated graph (GOGenePubmed) that reflects the relationship among genes, their annotations, and the semantic relationships between GO terms. E. Based on this graph, statistical schemes were designed and simulation experiments were performed to establish statistical models for assessing the functional coherence of gene sets. F-G. When provided with a gene list from experiment (F), the program can be used to search for coherent subsets among the list (G).

Mentions: In this study, we investigated a framework that utilizes the structure and semantic information of the GO to reveal major functional themes in a dynamic and gene-set-specific manner. We systematically studied different information-theory-based metrics to assess information loss when searching for suitable representations to summarize functional themes of gene sets. We further evaluated different statistical schemes to assess the functional coherence of a gene set summarized by a GO term. The conceptual overview of our research is shown in Figure 1.


Conceptualization of molecular findings by mining gene annotations.

Chen V, Lu X - BMC Proc (2013)

Conceptual Overview of Research. A-C. The ontological structure of the GO, protein annotations, and biomedical literature associated with genes were collected. D. The above information was combined to create an integrated graph (GOGenePubmed) that reflects the relationship among genes, their annotations, and the semantic relationships between GO terms. E. Based on this graph, statistical schemes were designed and simulation experiments were performed to establish statistical models for assessing the functional coherence of gene sets. F-G. When provided with a gene list from experiment (F), the program can be used to search for coherent subsets among the list (G).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4042834&req=5

Figure 1: Conceptual Overview of Research. A-C. The ontological structure of the GO, protein annotations, and biomedical literature associated with genes were collected. D. The above information was combined to create an integrated graph (GOGenePubmed) that reflects the relationship among genes, their annotations, and the semantic relationships between GO terms. E. Based on this graph, statistical schemes were designed and simulation experiments were performed to establish statistical models for assessing the functional coherence of gene sets. F-G. When provided with a gene list from experiment (F), the program can be used to search for coherent subsets among the list (G).
Mentions: In this study, we investigated a framework that utilizes the structure and semantic information of the GO to reveal major functional themes in a dynamic and gene-set-specific manner. We systematically studied different information-theory-based metrics to assess information loss when searching for suitable representations to summarize functional themes of gene sets. We further evaluated different statistical schemes to assess the functional coherence of a gene set summarized by a GO term. The conceptual overview of our research is shown in Figure 1.

Bottom Line: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations.We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner.

Methods: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations.

Results: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.

Conclusions: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

No MeSH data available.


Related in: MedlinePlus