Limits...
Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets.

Salem S, Ozcaglar C - BioData Min (2014)

Bottom Line: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links.The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links.Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA.

ABSTRACT

Background: Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression.

Results: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

No MeSH data available.


Related in: MedlinePlus

Overview of the proposed approach(A) Gene Expression datasets are represented as coexpression graphs; In (B) multiple coexpression graphs from (A) are represented as an edge-attributed summary graph. The topological and attribute edge similarity matrices are depicted in (C), the hybrid similarity matrix is shown in (D) and the final hybrid similarity matrix after applying a cutoff is shown in (E). The weighted hybrid graph is shown in (F) with the edge clusters enclosed by ovals in dotted lines.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4151083&req=5

Figure 1: Overview of the proposed approach(A) Gene Expression datasets are represented as coexpression graphs; In (B) multiple coexpression graphs from (A) are represented as an edge-attributed summary graph. The topological and attribute edge similarity matrices are depicted in (C), the hybrid similarity matrix is shown in (D) and the final hybrid similarity matrix after applying a cutoff is shown in (E). The weighted hybrid graph is shown in (F) with the edge clusters enclosed by ovals in dotted lines.

Mentions: A multi-layered graph, is a set of d graphs such that graph Gi = (V, Ei) for all 1 ≤ i ≤ d, where Ei ⊆ V × V, and V is the set of vertices shared by all the graphs. Figure 1(A) shows an illustrative example of a multi-layered graph with six graph layers defined over a set of seven vertices, i.e., V = {a, b, c, d, e, f, g}. Next, we define edge-attributed graphs.


Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets.

Salem S, Ozcaglar C - BioData Min (2014)

Overview of the proposed approach(A) Gene Expression datasets are represented as coexpression graphs; In (B) multiple coexpression graphs from (A) are represented as an edge-attributed summary graph. The topological and attribute edge similarity matrices are depicted in (C), the hybrid similarity matrix is shown in (D) and the final hybrid similarity matrix after applying a cutoff is shown in (E). The weighted hybrid graph is shown in (F) with the edge clusters enclosed by ovals in dotted lines.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4151083&req=5

Figure 1: Overview of the proposed approach(A) Gene Expression datasets are represented as coexpression graphs; In (B) multiple coexpression graphs from (A) are represented as an edge-attributed summary graph. The topological and attribute edge similarity matrices are depicted in (C), the hybrid similarity matrix is shown in (D) and the final hybrid similarity matrix after applying a cutoff is shown in (E). The weighted hybrid graph is shown in (F) with the edge clusters enclosed by ovals in dotted lines.
Mentions: A multi-layered graph, is a set of d graphs such that graph Gi = (V, Ei) for all 1 ≤ i ≤ d, where Ei ⊆ V × V, and V is the set of vertices shared by all the graphs. Figure 1(A) shows an illustrative example of a multi-layered graph with six graph layers defined over a set of seven vertices, i.e., V = {a, b, c, d, e, f, g}. Next, we define edge-attributed graphs.

Bottom Line: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links.The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links.Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA.

ABSTRACT

Background: Advances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression.

Results: We propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

No MeSH data available.


Related in: MedlinePlus