Limits...
Multiscale Embedded Gene Co-expression Network Analysis.

Song WM, Zhang B - PLoS Comput. Biol. (2015)

Bottom Line: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases.However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness.MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

ABSTRACT
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(/V/3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Show MeSH

Related in: MedlinePlus

Comparison of acceptance rates of correlation pairs into PFN links.A,B) Results from PFN construction from TCGA lung squamous cell carcinoma (LUSC) data including 20523 genes. 57562 links out of maximal possible link number of 61563 are embedded. The left panel (A) shows the acceptance rates without PCP (denoted as “serial”, and colored as blue), and after performing PCP (denoted as “PCP”, and colored as red), as a function of number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. The right panel (B) shows the ratio of acceptance rates after PCP to the acceptance rates without PCP is plotted as a function number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. C,D) Results from TCGA thyroid carcinoma (THCA) data including 16639 genes. 44802 out of maximal possible link number of 49911 are embedded. The right and left panel show the same plots as described in the case of LUSC.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4664553&req=5

pcbi.1004574.g002: Comparison of acceptance rates of correlation pairs into PFN links.A,B) Results from PFN construction from TCGA lung squamous cell carcinoma (LUSC) data including 20523 genes. 57562 links out of maximal possible link number of 61563 are embedded. The left panel (A) shows the acceptance rates without PCP (denoted as “serial”, and colored as blue), and after performing PCP (denoted as “PCP”, and colored as red), as a function of number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. The right panel (B) shows the ratio of acceptance rates after PCP to the acceptance rates without PCP is plotted as a function number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. C,D) Results from TCGA thyroid carcinoma (THCA) data including 16639 genes. 44802 out of maximal possible link number of 49911 are embedded. The right and left panel show the same plots as described in the case of LUSC.

Mentions: As shown in Fig 2, the acceptance rate by the serial PMFG algorithm quickly decreases close to 0% as the number of links in PFN reaches the maximal number of links. The finding indicates that PMFG performs exponentially increasing number of computations to embed more edges as the number of links in PFN saturates towards the maximal number. On the contrary, PCP remedies the problem by dramatically boosting the acceptance rate close to 100% as the number of links in PFN increases. These results demonstrate the effectiveness of PCP in reducing the overall computation time by leveraging parallel computation capability, and scalability of FPFNC for whole-genome co-expression network.


Multiscale Embedded Gene Co-expression Network Analysis.

Song WM, Zhang B - PLoS Comput. Biol. (2015)

Comparison of acceptance rates of correlation pairs into PFN links.A,B) Results from PFN construction from TCGA lung squamous cell carcinoma (LUSC) data including 20523 genes. 57562 links out of maximal possible link number of 61563 are embedded. The left panel (A) shows the acceptance rates without PCP (denoted as “serial”, and colored as blue), and after performing PCP (denoted as “PCP”, and colored as red), as a function of number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. The right panel (B) shows the ratio of acceptance rates after PCP to the acceptance rates without PCP is plotted as a function number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. C,D) Results from TCGA thyroid carcinoma (THCA) data including 16639 genes. 44802 out of maximal possible link number of 49911 are embedded. The right and left panel show the same plots as described in the case of LUSC.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4664553&req=5

pcbi.1004574.g002: Comparison of acceptance rates of correlation pairs into PFN links.A,B) Results from PFN construction from TCGA lung squamous cell carcinoma (LUSC) data including 20523 genes. 57562 links out of maximal possible link number of 61563 are embedded. The left panel (A) shows the acceptance rates without PCP (denoted as “serial”, and colored as blue), and after performing PCP (denoted as “PCP”, and colored as red), as a function of number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. The right panel (B) shows the ratio of acceptance rates after PCP to the acceptance rates without PCP is plotted as a function number of links already embedded on the PFN, normalized by the maximum possible number of embedded links. C,D) Results from TCGA thyroid carcinoma (THCA) data including 16639 genes. 44802 out of maximal possible link number of 49911 are embedded. The right and left panel show the same plots as described in the case of LUSC.
Mentions: As shown in Fig 2, the acceptance rate by the serial PMFG algorithm quickly decreases close to 0% as the number of links in PFN reaches the maximal number of links. The finding indicates that PMFG performs exponentially increasing number of computations to embed more edges as the number of links in PFN saturates towards the maximal number. On the contrary, PCP remedies the problem by dramatically boosting the acceptance rate close to 100% as the number of links in PFN increases. These results demonstrate the effectiveness of PCP in reducing the overall computation time by leveraging parallel computation capability, and scalability of FPFNC for whole-genome co-expression network.

Bottom Line: Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases.However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness.MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America.

ABSTRACT
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(/V/3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

Show MeSH
Related in: MedlinePlus