Limits...
Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network.

Atias O, Chor B, Chamovitz DA - BMC Syst Biol (2009)

Bottom Line: We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations.The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules.Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Sciences, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel. dafniosn@post.tau.ac.il

ABSTRACT

Background: Analyses of gene expression data from microarray experiments has become a central tool for identifying co-regulated, functional gene modules. A crucial aspect of such analysis is the integration of data from different experiments and different laboratories. How to weigh the contribution of different experiments is an important point influencing the final outcomes. We have developed a novel method for this integration, and applied it to genome-wide data from multiple Arabidopsis microarray experiments performed under a variety of experimental conditions. The goal of this study is to identify functional globally co-regulated gene modules in the Arabidopsis genome.

Results: Following the analysis of 21,000 Arabidopsis genes in 43 datasets and about 2 x 10(8) gene pairs, we identified a globally co-expressed gene network. We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations. Two types of modules were identified in the regulatory network that differed in their sensitivity to the node-scoring parameter; we further showed these two pertain to general and specialized modules. Some of these modules were further investigated using the Genevestigator compendium of microarray experiments. Analyses of smaller subsets of data lead to the identification of condition-specific modules.

Conclusion: Our method for identification of gene clusters allows the integration of diverse microarray experiments from many sources. The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules. Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

Show MeSH
Genes clusters found using MCODE. Clusters found using MCODE are visualized as nodes arranged in four levels of decreasing node score cutoff (0.2-0.05) as a parameter for MCODE. Node size corresponds to the number of genes in the cluster. Overlapping clusters (that share genes) are connected by an edge, with edge thickness corresponding to overlap size, with the thickest lines indicating that 100% of the child cluster is present in the parent cluster. Node colour intensity corresponds to GO enrichment. Clusters that have no GO enrichment are brightest, while red clusters have close to 100% of the genes sharing an enriched GO term. For clusters with more then one enriched GO term, color intensity shows the percent of genes having the most abundant term. A green asterisk appears above GO-enriched clusters that were used for further analysis. The number besides the asterisk corresponds to the cluster number given in Tables 4 and 5, and in Figure 5. A green plus sign appears above a non GO-enriched cluster that is assigned a putative cell cycle regulation role (see Results and Figure 6).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2944327&req=5

Figure 4: Genes clusters found using MCODE. Clusters found using MCODE are visualized as nodes arranged in four levels of decreasing node score cutoff (0.2-0.05) as a parameter for MCODE. Node size corresponds to the number of genes in the cluster. Overlapping clusters (that share genes) are connected by an edge, with edge thickness corresponding to overlap size, with the thickest lines indicating that 100% of the child cluster is present in the parent cluster. Node colour intensity corresponds to GO enrichment. Clusters that have no GO enrichment are brightest, while red clusters have close to 100% of the genes sharing an enriched GO term. For clusters with more then one enriched GO term, color intensity shows the percent of genes having the most abundant term. A green asterisk appears above GO-enriched clusters that were used for further analysis. The number besides the asterisk corresponds to the cluster number given in Tables 4 and 5, and in Figure 5. A green plus sign appears above a non GO-enriched cluster that is assigned a putative cell cycle regulation role (see Results and Figure 6).

Mentions: The cluster detection algorithm used, MCODE, relies on a major parameter called the "node score cutoff", which influences size of the detected clusters and their intra-connectivity. In our initial analysis we have used the default "node score cutoff" value of 0.2 for both the 0.3 and 0.4 networks. However, many of the large clusters in the networks are enriched for multiple and/or general GO terms (Tables 4 and 5), indicating that these clusters are not homogeneous. We suspected that this result would change for different "node score cutoff" values, so we repeated the analysis described above, including testing for enriched GO terms, for decreasing "node score cutoff" values (i.e. stricter clustering parameters). Additional data file 6 lists the genes comprising the clusters found with each tested value, in both the 0.3 and 0.4 networks, and Additional data file 7 lists the enriched GO terms for each cluster. The different cutoff values produce different clusters, including some with new GO terms. As can be expected, there is still significant overlap between clusters found using different cutoff values in the same network. We visualized our results in a hierarchical graph shown in Figure 4, in which nodes represent clusters and each level of the graph shows all clusters found using a particular node score cutoff as a parameter for MCODE. Edges connect overlapping clusters from consecutive levels. When comparing the 0.3 and 0.4 networks, the 0.4 network seems to break into more integral parts, with less overlap between clusters. This is expected, as the 0.4 network is a sub-network of the 0.3 network, containing only those edges representing a higher confidence of co-expression between the genes they connect. Although many of the 0.3 clusters overlap each other, this connectivity still allows for a fairly planar graph, without many intersecting edges. We find that in this near-planar representation, overlapping clusters tend to share similar GO terms (Figure 4A).


Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network.

Atias O, Chor B, Chamovitz DA - BMC Syst Biol (2009)

Genes clusters found using MCODE. Clusters found using MCODE are visualized as nodes arranged in four levels of decreasing node score cutoff (0.2-0.05) as a parameter for MCODE. Node size corresponds to the number of genes in the cluster. Overlapping clusters (that share genes) are connected by an edge, with edge thickness corresponding to overlap size, with the thickest lines indicating that 100% of the child cluster is present in the parent cluster. Node colour intensity corresponds to GO enrichment. Clusters that have no GO enrichment are brightest, while red clusters have close to 100% of the genes sharing an enriched GO term. For clusters with more then one enriched GO term, color intensity shows the percent of genes having the most abundant term. A green asterisk appears above GO-enriched clusters that were used for further analysis. The number besides the asterisk corresponds to the cluster number given in Tables 4 and 5, and in Figure 5. A green plus sign appears above a non GO-enriched cluster that is assigned a putative cell cycle regulation role (see Results and Figure 6).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2944327&req=5

Figure 4: Genes clusters found using MCODE. Clusters found using MCODE are visualized as nodes arranged in four levels of decreasing node score cutoff (0.2-0.05) as a parameter for MCODE. Node size corresponds to the number of genes in the cluster. Overlapping clusters (that share genes) are connected by an edge, with edge thickness corresponding to overlap size, with the thickest lines indicating that 100% of the child cluster is present in the parent cluster. Node colour intensity corresponds to GO enrichment. Clusters that have no GO enrichment are brightest, while red clusters have close to 100% of the genes sharing an enriched GO term. For clusters with more then one enriched GO term, color intensity shows the percent of genes having the most abundant term. A green asterisk appears above GO-enriched clusters that were used for further analysis. The number besides the asterisk corresponds to the cluster number given in Tables 4 and 5, and in Figure 5. A green plus sign appears above a non GO-enriched cluster that is assigned a putative cell cycle regulation role (see Results and Figure 6).
Mentions: The cluster detection algorithm used, MCODE, relies on a major parameter called the "node score cutoff", which influences size of the detected clusters and their intra-connectivity. In our initial analysis we have used the default "node score cutoff" value of 0.2 for both the 0.3 and 0.4 networks. However, many of the large clusters in the networks are enriched for multiple and/or general GO terms (Tables 4 and 5), indicating that these clusters are not homogeneous. We suspected that this result would change for different "node score cutoff" values, so we repeated the analysis described above, including testing for enriched GO terms, for decreasing "node score cutoff" values (i.e. stricter clustering parameters). Additional data file 6 lists the genes comprising the clusters found with each tested value, in both the 0.3 and 0.4 networks, and Additional data file 7 lists the enriched GO terms for each cluster. The different cutoff values produce different clusters, including some with new GO terms. As can be expected, there is still significant overlap between clusters found using different cutoff values in the same network. We visualized our results in a hierarchical graph shown in Figure 4, in which nodes represent clusters and each level of the graph shows all clusters found using a particular node score cutoff as a parameter for MCODE. Edges connect overlapping clusters from consecutive levels. When comparing the 0.3 and 0.4 networks, the 0.4 network seems to break into more integral parts, with less overlap between clusters. This is expected, as the 0.4 network is a sub-network of the 0.3 network, containing only those edges representing a higher confidence of co-expression between the genes they connect. Although many of the 0.3 clusters overlap each other, this connectivity still allows for a fairly planar graph, without many intersecting edges. We find that in this near-planar representation, overlapping clusters tend to share similar GO terms (Figure 4A).

Bottom Line: We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations.The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules.Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Sciences, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel. dafniosn@post.tau.ac.il

ABSTRACT

Background: Analyses of gene expression data from microarray experiments has become a central tool for identifying co-regulated, functional gene modules. A crucial aspect of such analysis is the integration of data from different experiments and different laboratories. How to weigh the contribution of different experiments is an important point influencing the final outcomes. We have developed a novel method for this integration, and applied it to genome-wide data from multiple Arabidopsis microarray experiments performed under a variety of experimental conditions. The goal of this study is to identify functional globally co-regulated gene modules in the Arabidopsis genome.

Results: Following the analysis of 21,000 Arabidopsis genes in 43 datasets and about 2 x 10(8) gene pairs, we identified a globally co-expressed gene network. We found clusters of globally co-expressed Arabidopsis genes that are enriched for known Gene Ontology annotations. Two types of modules were identified in the regulatory network that differed in their sensitivity to the node-scoring parameter; we further showed these two pertain to general and specialized modules. Some of these modules were further investigated using the Genevestigator compendium of microarray experiments. Analyses of smaller subsets of data lead to the identification of condition-specific modules.

Conclusion: Our method for identification of gene clusters allows the integration of diverse microarray experiments from many sources. The analysis reveals that part of the Arabidopsis transcriptome is globally co-expressed, and can be further divided into known as well as novel functional gene modules. Our methodology is general enough to apply to any set of microarray experiments, using any scoring function.

Show MeSH