Limits...
Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network.

Atias O, Chor B, Chamovitz DA - BMC Syst Biol (2009)

Bottom Line: We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations.The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules.Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Sciences, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel. dafniosn@post.tau.ac.il

ABSTRACT

Background: Analyses of gene expression data from microarray experiments has become a central tool for identifying co-regulated, functional gene modules. A crucial aspect of such analysis is the integration of data from different experiments and different laboratories. How to weigh the contribution of different experiments is an important point influencing the final outcomes. We have developed a novel method for this integration, and applied it to genome-wide data from multiple Arabidopsis microarray experiments performed under a variety of experimental conditions. The goal of this study is to identify functional globally co-regulated gene modules in the Arabidopsis genome.

Results: Following the analysis of 21,000 Arabidopsis genes in 43 datasets and about 2 x 10(8) gene pairs, we identified a globally co-expressed gene network. We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations. Two types of modules were identified in the regulatory network that differed in their sensitivity to the node-scoring parameter; we further showed these two pertain to general and specialized modules. Some of these modules were further investigated using the Genevestigator compendium of microarray experiments. Analyses of smaller subsets of data lead to the identification of condition-specific modules.

Conclusion: Our method for identification of gene clusters allows the integration of diverse microarray experiments from many sources. The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules. Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

Show MeSH
Percent of significantly co-expressed gene pairs the experiments used. For each experiment we calculated the number of significantly co-expressed gene pairs that were included in the analysis. The data is presented as a proportion out of all possible gene pairs. Co-expression between a pair of genes is considered as significant if the p-value calculated for the Pearson correlation coefficient is below 0.05.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2944327&req=5

Figure 9: Percent of significantly co-expressed gene pairs the experiments used. For each experiment we calculated the number of significantly co-expressed gene pairs that were included in the analysis. The data is presented as a proportion out of all possible gene pairs. Co-expression between a pair of genes is considered as significant if the p-value calculated for the Pearson correlation coefficient is below 0.05.

Mentions: We sought to integrate the expression correlation data into a network, in which nodes represent genes and edges connect pairs of genes whose expression levels are correlated across a given set of experiments. This raises the question of how to take into consideration different experiments when deciding if an edge should appear in the network. Initially, it may seem appropriate to assign each experiment an equal weight when considering the appearance of an edge. However, we chose not to use this naïve approach, for a number of reasons. First, the number of samples contained in each experiment varies, so correlation coefficients from different datasets cannot be compared directly. To avoid this problem, we did not compare the correlation coefficient themselves, but rather the corrected p-values assigned to each coefficient, as their calculation does incorporate the number of data points used for calculating the coefficients. We considered as statistically significant any correlation coefficient whose corrected p-value was lower than 0.05. Second, the distribution of correlation coefficients in each dataset is often very different from the normal distribution (selected examples are available in Figure 8). Furthermore, the number of statistically significant correlation coefficients observed in different datasets highly varies (Figure 9). For some datasets, out of all correlation coefficients calculated for the dataset, less than 1% are statistically significant, while for others datasets, more than 40% of the correlation coefficients are statistically significant. On average, across all datasets, about 10% of the correlation coefficients are statistically significant. This variation and its possible relation to the underlying biological conditions of each experiment are of interest, and are worth studying in their own right.


Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network.

Atias O, Chor B, Chamovitz DA - BMC Syst Biol (2009)

Percent of significantly co-expressed gene pairs the experiments used. For each experiment we calculated the number of significantly co-expressed gene pairs that were included in the analysis. The data is presented as a proportion out of all possible gene pairs. Co-expression between a pair of genes is considered as significant if the p-value calculated for the Pearson correlation coefficient is below 0.05.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2944327&req=5

Figure 9: Percent of significantly co-expressed gene pairs the experiments used. For each experiment we calculated the number of significantly co-expressed gene pairs that were included in the analysis. The data is presented as a proportion out of all possible gene pairs. Co-expression between a pair of genes is considered as significant if the p-value calculated for the Pearson correlation coefficient is below 0.05.
Mentions: We sought to integrate the expression correlation data into a network, in which nodes represent genes and edges connect pairs of genes whose expression levels are correlated across a given set of experiments. This raises the question of how to take into consideration different experiments when deciding if an edge should appear in the network. Initially, it may seem appropriate to assign each experiment an equal weight when considering the appearance of an edge. However, we chose not to use this naïve approach, for a number of reasons. First, the number of samples contained in each experiment varies, so correlation coefficients from different datasets cannot be compared directly. To avoid this problem, we did not compare the correlation coefficient themselves, but rather the corrected p-values assigned to each coefficient, as their calculation does incorporate the number of data points used for calculating the coefficients. We considered as statistically significant any correlation coefficient whose corrected p-value was lower than 0.05. Second, the distribution of correlation coefficients in each dataset is often very different from the normal distribution (selected examples are available in Figure 8). Furthermore, the number of statistically significant correlation coefficients observed in different datasets highly varies (Figure 9). For some datasets, out of all correlation coefficients calculated for the dataset, less than 1% are statistically significant, while for others datasets, more than 40% of the correlation coefficients are statistically significant. On average, across all datasets, about 10% of the correlation coefficients are statistically significant. This variation and its possible relation to the underlying biological conditions of each experiment are of interest, and are worth studying in their own right.

Bottom Line: We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations.The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules.Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Sciences, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel. dafniosn@post.tau.ac.il

ABSTRACT

Background: Analyses of gene expression data from microarray experiments has become a central tool for identifying co-regulated, functional gene modules. A crucial aspect of such analysis is the integration of data from different experiments and different laboratories. How to weigh the contribution of different experiments is an important point influencing the final outcomes. We have developed a novel method for this integration, and applied it to genome-wide data from multiple Arabidopsis microarray experiments performed under a variety of experimental conditions. The goal of this study is to identify functional globally co-regulated gene modules in the Arabidopsis genome.

Results: Following the analysis of 21,000 Arabidopsis genes in 43 datasets and about 2 x 10(8) gene pairs, we identified a globally co-expressed gene network. We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations. Two types of modules were identified in the regulatory network that differed in their sensitivity to the node-scoring parameter; we further showed these two pertain to general and specialized modules. Some of these modules were further investigated using the Genevestigator compendium of microarray experiments. Analyses of smaller subsets of data lead to the identification of condition-specific modules.

Conclusion: Our method for identification of gene clusters allows the integration of diverse microarray experiments from many sources. The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules. Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

Show MeSH