Limits...
Multiscale Embedded Gene Co-expression Network Analysis.

Song WM, Zhang B - PLoS Comput. Biol. (2015)

Bottom Line: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases.However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness.MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

ABSTRACT
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(/V/3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Show MeSH

Related in: MedlinePlus

Hierarchical organization of functions and signaling pathways corresponding to the multiscale clusters identified by MEGENA.A) Comparison of number of significantly enriched functions and pathway signatures across clusters identified at different scale groups. The scale groups identified from MHA are colored according to the legend, and “all” denotes collection of clusters across the scale groups. B) Multiscale organization of clusters in PFN. Each node is a cluster identified by multiscale clustering in PFN, where the node size is proportional to the cluster size, node color coincides with the cluster group color scheme in A, and node labels indicate most enriched function/signaling pathway for individual clusters. A directed link a→b indicates b is a sub-cluster of a.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4664553&req=5

pcbi.1004574.g009: Hierarchical organization of functions and signaling pathways corresponding to the multiscale clusters identified by MEGENA.A) Comparison of number of significantly enriched functions and pathway signatures across clusters identified at different scale groups. The scale groups identified from MHA are colored according to the legend, and “all” denotes collection of clusters across the scale groups. B) Multiscale organization of clusters in PFN. Each node is a cluster identified by multiscale clustering in PFN, where the node size is proportional to the cluster size, node color coincides with the cluster group color scheme in A, and node labels indicate most enriched function/signaling pathway for individual clusters. A directed link a→b indicates b is a sub-cluster of a.

Mentions: We performed MHA on the BRCA PFN to identify the groups of scales that had similar interaction patterns and shared highly connected hubs across different scales. Six distinctive scale groups were identified: S1 (0.03 ≤ α ≤ 0.48), S2 (0.5 ≤ α ≤ 0.82), S3 (0.87 ≤ α ≤ 1), S4 (1.01 ≤ α ≤ 1.29), S5 (1.3 ≤ α ≤ 1.82) and S6 (1.83 ≤ α ≤ 6.8). Biological relevance of each scale group was evaluated by the number of significantly enriched MSigDB gene sets. We compared the performance of the clusters at each scale group and that of the clusters across all scale groups. Fig 9A shows that the combination of all the clusters across the different scale groups consistently outperforms the individual scale groups across almost the entire range of significance levels. Interestingly, the clusters at the scale S5 (1.30 ≤ α ≤ 1.82) show the best performance when compared against other scale groups. The clusters identified at the finest scale of S5 (α = 1.3) are shown in Fig 4.


Multiscale Embedded Gene Co-expression Network Analysis.

Song WM, Zhang B - PLoS Comput. Biol. (2015)

Hierarchical organization of functions and signaling pathways corresponding to the multiscale clusters identified by MEGENA.A) Comparison of number of significantly enriched functions and pathway signatures across clusters identified at different scale groups. The scale groups identified from MHA are colored according to the legend, and “all” denotes collection of clusters across the scale groups. B) Multiscale organization of clusters in PFN. Each node is a cluster identified by multiscale clustering in PFN, where the node size is proportional to the cluster size, node color coincides with the cluster group color scheme in A, and node labels indicate most enriched function/signaling pathway for individual clusters. A directed link a→b indicates b is a sub-cluster of a.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4664553&req=5

pcbi.1004574.g009: Hierarchical organization of functions and signaling pathways corresponding to the multiscale clusters identified by MEGENA.A) Comparison of number of significantly enriched functions and pathway signatures across clusters identified at different scale groups. The scale groups identified from MHA are colored according to the legend, and “all” denotes collection of clusters across the scale groups. B) Multiscale organization of clusters in PFN. Each node is a cluster identified by multiscale clustering in PFN, where the node size is proportional to the cluster size, node color coincides with the cluster group color scheme in A, and node labels indicate most enriched function/signaling pathway for individual clusters. A directed link a→b indicates b is a sub-cluster of a.
Mentions: We performed MHA on the BRCA PFN to identify the groups of scales that had similar interaction patterns and shared highly connected hubs across different scales. Six distinctive scale groups were identified: S1 (0.03 ≤ α ≤ 0.48), S2 (0.5 ≤ α ≤ 0.82), S3 (0.87 ≤ α ≤ 1), S4 (1.01 ≤ α ≤ 1.29), S5 (1.3 ≤ α ≤ 1.82) and S6 (1.83 ≤ α ≤ 6.8). Biological relevance of each scale group was evaluated by the number of significantly enriched MSigDB gene sets. We compared the performance of the clusters at each scale group and that of the clusters across all scale groups. Fig 9A shows that the combination of all the clusters across the different scale groups consistently outperforms the individual scale groups across almost the entire range of significance levels. Interestingly, the clusters at the scale S5 (1.30 ≤ α ≤ 1.82) show the best performance when compared against other scale groups. The clusters identified at the finest scale of S5 (α = 1.3) are shown in Fig 4.

Bottom Line: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases.However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness.MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

ABSTRACT
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(/V/3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Show MeSH
Related in: MedlinePlus