Limits...
Conceptualization of molecular findings by mining gene annotations.

Chen V, Lu X - BMC Proc (2013)

Bottom Line: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations.We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner.

Methods: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations.

Results: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.

Conclusions: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

No MeSH data available.


Related in: MedlinePlus

Average within-module PPI ratios for summarizing terms. Plots of the calculated average within-module PPI ratio for the summarizing terms that resulted from different thresholds of merging and use of the AMIGO GO Slim tool. The whisker denotes the calculated standard error.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4042834&req=5

Figure 6: Average within-module PPI ratios for summarizing terms. Plots of the calculated average within-module PPI ratio for the summarizing terms that resulted from different thresholds of merging and use of the AMIGO GO Slim tool. The whisker denotes the calculated standard error.

Mentions: To support the notion that the statistical model developed by this study effectively measures the functional coherence, we assessed the functional relatedness of the proteins in a subset returned by our models using another measure, the within-module PPI ratio, and compared the results. In a series of experiments, we applied our model to the differentially expressed gene set using 3 different p-value cutoff thresholds (0.1, 0.05 and 0.01), leading to 3 collections of modules. We then investigated whether the within-module PPI ratio exhibited an anti-correlation with p-values, based on the assumption that a more coherent gene set (with a smaller p-value) should generate more within-module PPIs. Figure 6 plots the within-module PPI ratios of the gene modules derived using different p-value cutoff thresholds, as well as the value derived from the modules produced by mapping genes to the Generic GO slim. The results indicate that, indeed, the more stringent the p-values, the higher the within-module PPI ratio; the genes grouped by the GO slim had the lowest within-module PPI ratio. Our metric agrees with another biologically sensible metric reflecting the functional coherence of genes. A similar finding by Dutkowski et al. [32] corroborates our results as well.


Conceptualization of molecular findings by mining gene annotations.

Chen V, Lu X - BMC Proc (2013)

Average within-module PPI ratios for summarizing terms. Plots of the calculated average within-module PPI ratio for the summarizing terms that resulted from different thresholds of merging and use of the AMIGO GO Slim tool. The whisker denotes the calculated standard error.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4042834&req=5

Figure 6: Average within-module PPI ratios for summarizing terms. Plots of the calculated average within-module PPI ratio for the summarizing terms that resulted from different thresholds of merging and use of the AMIGO GO Slim tool. The whisker denotes the calculated standard error.
Mentions: To support the notion that the statistical model developed by this study effectively measures the functional coherence, we assessed the functional relatedness of the proteins in a subset returned by our models using another measure, the within-module PPI ratio, and compared the results. In a series of experiments, we applied our model to the differentially expressed gene set using 3 different p-value cutoff thresholds (0.1, 0.05 and 0.01), leading to 3 collections of modules. We then investigated whether the within-module PPI ratio exhibited an anti-correlation with p-values, based on the assumption that a more coherent gene set (with a smaller p-value) should generate more within-module PPIs. Figure 6 plots the within-module PPI ratios of the gene modules derived using different p-value cutoff thresholds, as well as the value derived from the modules produced by mapping genes to the Generic GO slim. The results indicate that, indeed, the more stringent the p-values, the higher the within-module PPI ratio; the genes grouped by the GO slim had the lowest within-module PPI ratio. Our metric agrees with another biologically sensible metric reflecting the functional coherence of genes. A similar finding by Dutkowski et al. [32] corroborates our results as well.

Bottom Line: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations.We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner.

Methods: In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations.

Results: We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.

Conclusions: Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.

No MeSH data available.


Related in: MedlinePlus