Limits...
Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity.

Furlotte NA, Kang HM, Ye C, Eskin E - Bioinformatics (2011)

Bottom Line: Many general methods have been suggested, which aim to remove the effects of confounding from gene expression data.Confounding effects are expected to be encoded in the matrix representing the correlation between arrays, the inter-sample correlation matrix.By conditioning on the information in the inter-sample correlation matrix, MMC is able to produce gene coexpressions that are not influenced by global confounding effects and thus significantly reduce the number of spurious coexpressions observed.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of California, Los Angeles, CA 90024, USA. nfurlott@cs.ucla.edu

ABSTRACT

Motivation: The analysis of gene coexpression is at the core of many types of genetic analysis. The coexpression between two genes can be calculated by using a traditional Pearson's correlation coefficient. However, unobserved confounding effects may cause inflation of the Pearson's correlation so that uncorrelated genes appear correlated. Many general methods have been suggested, which aim to remove the effects of confounding from gene expression data. However, the residual confounding which is not accounted for by these generic correction procedures has the potential to induce correlation between genes. Therefore, a method that specifically aims to calculate gene coexpression between gene expression arrays, while accounting for confounding effects, is desirable.

Results: In this article, we present a statistical model for calculating gene coexpression called mixed model coexpression (MMC), which models coexpression within a mixed model framework. Confounding effects are expected to be encoded in the matrix representing the correlation between arrays, the inter-sample correlation matrix. By conditioning on the information in the inter-sample correlation matrix, MMC is able to produce gene coexpressions that are not influenced by global confounding effects and thus significantly reduce the number of spurious coexpressions observed. We applied MMC to both human and yeast datasets and show it is better able to effectively prioritize strong coexpressions when compared to a traditional Pearson's correlation and a Pearson's correlation applied to data corrected with surrogate variable analysis (SVA).

Availability: The method is implemented in the R programming language and may be found at http://genetics.cs.ucla.edu/mmc.

Contact: nfurlott@cs.ucla.edu; eeskin@cs.ucla.edu.

Show MeSH
Distribution of gene-module P-values for Pearson, SVA and MMC. We used a set of 233 known functional modules consisting of sets of genes of size 2 to 20. For each of these modules, a P-value representing the biological significance is calculated. This figure plots the distributions of these P-values. Since the P-values were calculated for gene sets known to be functionally related, we expect that there should be an inflation of significant P-values. It can be seen that the MMC method produces a larger number of significant P-values when compared to both the traditional Pearson and SVA-corrected coexpressions.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117390&req=5

Figure 3: Distribution of gene-module P-values for Pearson, SVA and MMC. We used a set of 233 known functional modules consisting of sets of genes of size 2 to 20. For each of these modules, a P-value representing the biological significance is calculated. This figure plots the distributions of these P-values. Since the P-values were calculated for gene sets known to be functionally related, we expect that there should be an inflation of significant P-values. It can be seen that the MMC method produces a larger number of significant P-values when compared to both the traditional Pearson and SVA-corrected coexpressions.

Mentions: Figure 3 shows the distribution of the P-values for all modules. Module P-values obtained when using our method tend to be smaller than the Pearson and SVA module P-values. For example, ~40% of the tested gene modules were significant at a level of 0.05 when using MMC, while ~25% and 30% were significant when using Pearson and SVA, respectively. This result suggests that MMC is able to produce coexpression values which were better able to predict real biological relationships.Fig. 3.


Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity.

Furlotte NA, Kang HM, Ye C, Eskin E - Bioinformatics (2011)

Distribution of gene-module P-values for Pearson, SVA and MMC. We used a set of 233 known functional modules consisting of sets of genes of size 2 to 20. For each of these modules, a P-value representing the biological significance is calculated. This figure plots the distributions of these P-values. Since the P-values were calculated for gene sets known to be functionally related, we expect that there should be an inflation of significant P-values. It can be seen that the MMC method produces a larger number of significant P-values when compared to both the traditional Pearson and SVA-corrected coexpressions.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117390&req=5

Figure 3: Distribution of gene-module P-values for Pearson, SVA and MMC. We used a set of 233 known functional modules consisting of sets of genes of size 2 to 20. For each of these modules, a P-value representing the biological significance is calculated. This figure plots the distributions of these P-values. Since the P-values were calculated for gene sets known to be functionally related, we expect that there should be an inflation of significant P-values. It can be seen that the MMC method produces a larger number of significant P-values when compared to both the traditional Pearson and SVA-corrected coexpressions.
Mentions: Figure 3 shows the distribution of the P-values for all modules. Module P-values obtained when using our method tend to be smaller than the Pearson and SVA module P-values. For example, ~40% of the tested gene modules were significant at a level of 0.05 when using MMC, while ~25% and 30% were significant when using Pearson and SVA, respectively. This result suggests that MMC is able to produce coexpression values which were better able to predict real biological relationships.Fig. 3.

Bottom Line: Many general methods have been suggested, which aim to remove the effects of confounding from gene expression data.Confounding effects are expected to be encoded in the matrix representing the correlation between arrays, the inter-sample correlation matrix.By conditioning on the information in the inter-sample correlation matrix, MMC is able to produce gene coexpressions that are not influenced by global confounding effects and thus significantly reduce the number of spurious coexpressions observed.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of California, Los Angeles, CA 90024, USA. nfurlott@cs.ucla.edu

ABSTRACT

Motivation: The analysis of gene coexpression is at the core of many types of genetic analysis. The coexpression between two genes can be calculated by using a traditional Pearson's correlation coefficient. However, unobserved confounding effects may cause inflation of the Pearson's correlation so that uncorrelated genes appear correlated. Many general methods have been suggested, which aim to remove the effects of confounding from gene expression data. However, the residual confounding which is not accounted for by these generic correction procedures has the potential to induce correlation between genes. Therefore, a method that specifically aims to calculate gene coexpression between gene expression arrays, while accounting for confounding effects, is desirable.

Results: In this article, we present a statistical model for calculating gene coexpression called mixed model coexpression (MMC), which models coexpression within a mixed model framework. Confounding effects are expected to be encoded in the matrix representing the correlation between arrays, the inter-sample correlation matrix. By conditioning on the information in the inter-sample correlation matrix, MMC is able to produce gene coexpressions that are not influenced by global confounding effects and thus significantly reduce the number of spurious coexpressions observed. We applied MMC to both human and yeast datasets and show it is better able to effectively prioritize strong coexpressions when compared to a traditional Pearson's correlation and a Pearson's correlation applied to data corrected with surrogate variable analysis (SVA).

Availability: The method is implemented in the R programming language and may be found at http://genetics.cs.ucla.edu/mmc.

Contact: nfurlott@cs.ucla.edu; eeskin@cs.ucla.edu.

Show MeSH