Limits...
Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets.

Salem S, Ozcaglar C - BioData Min (2014)

Bottom Line: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links.The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links.Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA.

ABSTRACT

Background: Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression.

Results: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

No MeSH data available.


Related in: MedlinePlus

Enrichment analysis. Enrichment of functional annotations in the edge clusters for varying hybrid similarity thresholds (α = 0.5).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4151083&req=5

Figure 5: Enrichment analysis. Enrichment of functional annotations in the edge clusters for varying hybrid similarity thresholds (α = 0.5).

Mentions: To assess the biological significance of the reported edge clusters, we performed functional enrichment analysis of Gene Ontology (GO) biological process terms as well as KEGG pathways. The enrichment analysis was performed using the DAVID tool [21,22]. In the analysis, only edge clusters with at least five genes were included. If the set of genes in an edge cluster is significantly enriched with at least one biological process GO term, then we say that the edge cluster is enriched. We computed the percentage of enriched edge clusters reported from several hybrid graphs for varying hybrid similarity thresholds. We have seen in the topological analysis of the reported edge clusters that for higher hybrid similarity thresholds, the reported edge clusters have higher recurrence. As shown in Figure 4, the percentage of enriched modules is higher for large hybrid similarity thresholds, thus for recurrent modules. Moreover, the same trend holds for KEGG pathways enrichment. The percentage of edges clusters that are enriched in KEGG pathways is much higher for larger β thresholds. The same trend is observed for different α values that control the contribution of the edge structural similarity. Figure 5 illustrates the percentage of enriched edge clusters α = 0.5. For α = 0.5 and β = 0.5, the algorithm reported 276 edge clusters, 139 of which had at least 5 genes. Some of the biological process GO terms that were highly enriched in these 139 modules include cell cycle phase,’ ‘cell cycle process,’ ‘cell cycle,’ ‘mitotic cell cycle,’ ‘M phase of mitotic cell cycle,’ and ‘cell proliferation.’ These results corroborate the premise of this research that integrative analysis of gene expression data reveal meaningful biological insights.


Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets.

Salem S, Ozcaglar C - BioData Min (2014)

Enrichment analysis. Enrichment of functional annotations in the edge clusters for varying hybrid similarity thresholds (α = 0.5).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4151083&req=5

Figure 5: Enrichment analysis. Enrichment of functional annotations in the edge clusters for varying hybrid similarity thresholds (α = 0.5).
Mentions: To assess the biological significance of the reported edge clusters, we performed functional enrichment analysis of Gene Ontology (GO) biological process terms as well as KEGG pathways. The enrichment analysis was performed using the DAVID tool [21,22]. In the analysis, only edge clusters with at least five genes were included. If the set of genes in an edge cluster is significantly enriched with at least one biological process GO term, then we say that the edge cluster is enriched. We computed the percentage of enriched edge clusters reported from several hybrid graphs for varying hybrid similarity thresholds. We have seen in the topological analysis of the reported edge clusters that for higher hybrid similarity thresholds, the reported edge clusters have higher recurrence. As shown in Figure 4, the percentage of enriched modules is higher for large hybrid similarity thresholds, thus for recurrent modules. Moreover, the same trend holds for KEGG pathways enrichment. The percentage of edges clusters that are enriched in KEGG pathways is much higher for larger β thresholds. The same trend is observed for different α values that control the contribution of the edge structural similarity. Figure 5 illustrates the percentage of enriched edge clusters α = 0.5. For α = 0.5 and β = 0.5, the algorithm reported 276 edge clusters, 139 of which had at least 5 genes. Some of the biological process GO terms that were highly enriched in these 139 modules include cell cycle phase,’ ‘cell cycle process,’ ‘cell cycle,’ ‘mitotic cell cycle,’ ‘M phase of mitotic cell cycle,’ and ‘cell proliferation.’ These results corroborate the premise of this research that integrative analysis of gene expression data reveal meaningful biological insights.

Bottom Line: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links.The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links.Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA.

ABSTRACT

Background: Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression.

Results: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

No MeSH data available.


Related in: MedlinePlus