Limits...
Multiscale Embedded Gene Co-expression Network Analysis.

Song WM, Zhang B - PLoS Comput. Biol. (2015)

Bottom Line: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases.However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness.MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

ABSTRACT
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(/V/3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Show MeSH

Related in: MedlinePlus

Comparison of MEGENA (as a combination of the multiscale clustering analysis and PFN) and various combinations of the established clustering techniques (eigenvector, infomap, walktrap, WGCNA) and the networks (PFN, FDRN, WGCN) using the TCGA BRCA gene expression data.Two different similarity measures (MI and PCC) were used to perform analyses to compare robustness with respect to difference in measures to evaluate interactions. A) The number of significantly enriched functional/pathway signatures (Bonferroni corrected FET p-values) from MSigDB at various p-value thresholds against. B) Number of significantly enriched functional/pathway signatures from MSigDB at the various odds ratio thresholds. C) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) at various significance levels. D) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) and associated to at least one significantly under-represented signatures with Bonferroni corrected FET p-value < 0.05.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4664553&req=5

pcbi.1004574.g006: Comparison of MEGENA (as a combination of the multiscale clustering analysis and PFN) and various combinations of the established clustering techniques (eigenvector, infomap, walktrap, WGCNA) and the networks (PFN, FDRN, WGCN) using the TCGA BRCA gene expression data.Two different similarity measures (MI and PCC) were used to perform analyses to compare robustness with respect to difference in measures to evaluate interactions. A) The number of significantly enriched functional/pathway signatures (Bonferroni corrected FET p-values) from MSigDB at various p-value thresholds against. B) Number of significantly enriched functional/pathway signatures from MSigDB at the various odds ratio thresholds. C) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) at various significance levels. D) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) and associated to at least one significantly under-represented signatures with Bonferroni corrected FET p-value < 0.05.

Mentions: As shown in Fig 6A and 6B, the multiscale clustering analysis (MCA) has the best performance since the resulting clusters are enriched for the largest number of the annotated gene sets with respect to all significance levels in both MI- and PCC-based networks. More importantly, the MCA-derived clusters show the largest fold enrichment of the BRCA oncogenic signatures. Similar results are observed in PCC-based LUAD networks (see S3A and S3B Fig). MCA consistently outperforms the established co-expression network analysis method, WGCNA in both the BRCA and LUAD cases.


Multiscale Embedded Gene Co-expression Network Analysis.

Song WM, Zhang B - PLoS Comput. Biol. (2015)

Comparison of MEGENA (as a combination of the multiscale clustering analysis and PFN) and various combinations of the established clustering techniques (eigenvector, infomap, walktrap, WGCNA) and the networks (PFN, FDRN, WGCN) using the TCGA BRCA gene expression data.Two different similarity measures (MI and PCC) were used to perform analyses to compare robustness with respect to difference in measures to evaluate interactions. A) The number of significantly enriched functional/pathway signatures (Bonferroni corrected FET p-values) from MSigDB at various p-value thresholds against. B) Number of significantly enriched functional/pathway signatures from MSigDB at the various odds ratio thresholds. C) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) at various significance levels. D) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) and associated to at least one significantly under-represented signatures with Bonferroni corrected FET p-value < 0.05.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4664553&req=5

pcbi.1004574.g006: Comparison of MEGENA (as a combination of the multiscale clustering analysis and PFN) and various combinations of the established clustering techniques (eigenvector, infomap, walktrap, WGCNA) and the networks (PFN, FDRN, WGCN) using the TCGA BRCA gene expression data.Two different similarity measures (MI and PCC) were used to perform analyses to compare robustness with respect to difference in measures to evaluate interactions. A) The number of significantly enriched functional/pathway signatures (Bonferroni corrected FET p-values) from MSigDB at various p-value thresholds against. B) Number of significantly enriched functional/pathway signatures from MSigDB at the various odds ratio thresholds. C) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) at various significance levels. D) Number of clusters predictive of patient survival (based on FDR corrected Cox p-values) and associated to at least one significantly under-represented signatures with Bonferroni corrected FET p-value < 0.05.
Mentions: As shown in Fig 6A and 6B, the multiscale clustering analysis (MCA) has the best performance since the resulting clusters are enriched for the largest number of the annotated gene sets with respect to all significance levels in both MI- and PCC-based networks. More importantly, the MCA-derived clusters show the largest fold enrichment of the BRCA oncogenic signatures. Similar results are observed in PCC-based LUAD networks (see S3A and S3B Fig). MCA consistently outperforms the established co-expression network analysis method, WGCNA in both the BRCA and LUAD cases.

Bottom Line: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases.However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness.MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

ABSTRACT
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(/V/3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Show MeSH
Related in: MedlinePlus