Limits...
Origin of co-expression patterns in E. coli and S. cerevisiae emerging from reverse engineering algorithms.

Zampieri M, Soranzo N, Bianchini D, Altafini C - PLoS ONE (2008)

Bottom Line: For the first organism the pattern of co-expression is shown to reflect in fine detail both the operonal structure of the DNA and the regulatory effects exerted by the gene products when co-participating in a protein complex.The gene co-expression patterns deduced from compendia of profiling experiments tend to unveil functional categories that are mainly associated to stable bindings rather than transient interactions.The inference power of this systematic analysis is substantially reduced when passing from E. coli to S. cerevisiae.

View Article: PubMed Central - PubMed

Affiliation: SISSA-ISAS, International School for Advanced Studies, Trieste, Italy.

ABSTRACT

Background: The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from compendia of high throughput microarray data has been extensively used in the last few years to deduce/integrate/validate various types of "physical" networks of interactions among genes or gene products.

Results: This paper gives a comprehensive overview of which of these networks emerge significantly when reverse engineering large collections of gene expression data for two model organisms, E. coli and S. cerevisiae, without any prior information. For the first organism the pattern of co-expression is shown to reflect in fine detail both the operonal structure of the DNA and the regulatory effects exerted by the gene products when co-participating in a protein complex. For the second organism we find that direct transcriptional control (e.g., transcription factor-binding site interactions) has little statistical significance in comparison to the other regulatory mechanisms (such as co-sharing a protein complex, co-localization on a metabolic pathway or compartment), which are however resolved at a lower level of detail than in E. coli.

Conclusion: The gene co-expression patterns deduced from compendia of profiling experiments tend to unveil functional categories that are mainly associated to stable bindings rather than transient interactions. The inference power of this systematic analysis is substantially reduced when passing from E. coli to S. cerevisiae. This extensive analysis provides a way to describe the different complexity between the two organisms and discusses the critical limitations affecting this type of methodologies.

Show MeSH

Related in: MedlinePlus

Pearson correlation and distance on the genome.Co-expression decays more rapidly with distance in S.cerevisiae than in E.coli: the correlation drops to 0.2 at a distance of 6 Kbp in E.coli (a), as opposed to 1 Kbp in S.cerevisiae, for both cDNA and Affymetrix datasets (c). In E.coli the value 6 Kbp is consistent with the distribution of TU width (inset panel in (a)). Genes on the same strand have much higher correlation than genes on opposite strands. For E.coli, even if we restrict to gene pairs not involved in a TU (see dashed blu line in (a)), the influence of distance on co-expression is still clearly visible. In S.cerevisiae, the short-range high correlation peak is represented almost completely by overlapping ORFs (the distribution of ORF widths is shown in the inset), for which the cDNA experiments cannot discern any strand-specificity, unlike Affymetrix experiments. In panel (b), the distribution of intracluster average distances (see Supplementary Notes S1) for E.coli is compared with the corresponding distributions of average distances among PC and TU subunits. The histogram for the clusters is more similar to that of TU than PC, although its tail is heavier and more related to PC. A similar analysis is impossible for S.cerevisiae as the vast majority of clusters is composed of genes located on different chromosomes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2500178&req=5

pone-0002981-g004: Pearson correlation and distance on the genome.Co-expression decays more rapidly with distance in S.cerevisiae than in E.coli: the correlation drops to 0.2 at a distance of 6 Kbp in E.coli (a), as opposed to 1 Kbp in S.cerevisiae, for both cDNA and Affymetrix datasets (c). In E.coli the value 6 Kbp is consistent with the distribution of TU width (inset panel in (a)). Genes on the same strand have much higher correlation than genes on opposite strands. For E.coli, even if we restrict to gene pairs not involved in a TU (see dashed blu line in (a)), the influence of distance on co-expression is still clearly visible. In S.cerevisiae, the short-range high correlation peak is represented almost completely by overlapping ORFs (the distribution of ORF widths is shown in the inset), for which the cDNA experiments cannot discern any strand-specificity, unlike Affymetrix experiments. In panel (b), the distribution of intracluster average distances (see Supplementary Notes S1) for E.coli is compared with the corresponding distributions of average distances among PC and TU subunits. The histogram for the clusters is more similar to that of TU than PC, although its tail is heavier and more related to PC. A similar analysis is impossible for S.cerevisiae as the vast majority of clusters is composed of genes located on different chromosomes.

Mentions: For E.coli, the operonal structure of the genome is certainly a key factor in the formation of the clusters [20], [7]. In Fig. 4 (a) and (c), co-expression of genes located adjacent to each other on the genome is quantified and genes belonging to the same or to different strands are distinguished. However, the operonal structure alone does not exhaust the information that can be extrapolated from the expression correlation patterns (see Fig. 4 and Fig. 5). We can notice for instance that the distribution of intracluster average gene distances (shown in Fig. 4(b)) although largely comparable to that of the TU, has a heavier tail, more related to the PC distribution. Most of the large clusters are examples of functional information not exhausted by any operonal structure. It is interesting to notice that the difference in the overlap clusters/TU concerns most often the genes located at the boundaries of the operons (see e.g. cl. 3, 5, 6, 10, and many more). In spite of this, as a confirmation that the operonal structure and/or protein complex interactions are much stronger mediators of co-expression than direct DNA binding (i.e. being a pair of TF-BS), we notice that co-clustering of these last pairs are sporadic (e.g. cl. 1, 3, 7, 24, 38, 74, 101). The influence of the genes distance on their co-expression is noticeable to some extent also in S.cerevisiae [36] but decays more rapidly than in E.coli (see Fig. 4(c)). While the decay/distance ratio is similar on the cDNA and Affymetrix datasets, for contiguous genes the former is unable to distinguish strain specific genes.


Origin of co-expression patterns in E. coli and S. cerevisiae emerging from reverse engineering algorithms.

Zampieri M, Soranzo N, Bianchini D, Altafini C - PLoS ONE (2008)

Pearson correlation and distance on the genome.Co-expression decays more rapidly with distance in S.cerevisiae than in E.coli: the correlation drops to 0.2 at a distance of 6 Kbp in E.coli (a), as opposed to 1 Kbp in S.cerevisiae, for both cDNA and Affymetrix datasets (c). In E.coli the value 6 Kbp is consistent with the distribution of TU width (inset panel in (a)). Genes on the same strand have much higher correlation than genes on opposite strands. For E.coli, even if we restrict to gene pairs not involved in a TU (see dashed blu line in (a)), the influence of distance on co-expression is still clearly visible. In S.cerevisiae, the short-range high correlation peak is represented almost completely by overlapping ORFs (the distribution of ORF widths is shown in the inset), for which the cDNA experiments cannot discern any strand-specificity, unlike Affymetrix experiments. In panel (b), the distribution of intracluster average distances (see Supplementary Notes S1) for E.coli is compared with the corresponding distributions of average distances among PC and TU subunits. The histogram for the clusters is more similar to that of TU than PC, although its tail is heavier and more related to PC. A similar analysis is impossible for S.cerevisiae as the vast majority of clusters is composed of genes located on different chromosomes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2500178&req=5

pone-0002981-g004: Pearson correlation and distance on the genome.Co-expression decays more rapidly with distance in S.cerevisiae than in E.coli: the correlation drops to 0.2 at a distance of 6 Kbp in E.coli (a), as opposed to 1 Kbp in S.cerevisiae, for both cDNA and Affymetrix datasets (c). In E.coli the value 6 Kbp is consistent with the distribution of TU width (inset panel in (a)). Genes on the same strand have much higher correlation than genes on opposite strands. For E.coli, even if we restrict to gene pairs not involved in a TU (see dashed blu line in (a)), the influence of distance on co-expression is still clearly visible. In S.cerevisiae, the short-range high correlation peak is represented almost completely by overlapping ORFs (the distribution of ORF widths is shown in the inset), for which the cDNA experiments cannot discern any strand-specificity, unlike Affymetrix experiments. In panel (b), the distribution of intracluster average distances (see Supplementary Notes S1) for E.coli is compared with the corresponding distributions of average distances among PC and TU subunits. The histogram for the clusters is more similar to that of TU than PC, although its tail is heavier and more related to PC. A similar analysis is impossible for S.cerevisiae as the vast majority of clusters is composed of genes located on different chromosomes.
Mentions: For E.coli, the operonal structure of the genome is certainly a key factor in the formation of the clusters [20], [7]. In Fig. 4 (a) and (c), co-expression of genes located adjacent to each other on the genome is quantified and genes belonging to the same or to different strands are distinguished. However, the operonal structure alone does not exhaust the information that can be extrapolated from the expression correlation patterns (see Fig. 4 and Fig. 5). We can notice for instance that the distribution of intracluster average gene distances (shown in Fig. 4(b)) although largely comparable to that of the TU, has a heavier tail, more related to the PC distribution. Most of the large clusters are examples of functional information not exhausted by any operonal structure. It is interesting to notice that the difference in the overlap clusters/TU concerns most often the genes located at the boundaries of the operons (see e.g. cl. 3, 5, 6, 10, and many more). In spite of this, as a confirmation that the operonal structure and/or protein complex interactions are much stronger mediators of co-expression than direct DNA binding (i.e. being a pair of TF-BS), we notice that co-clustering of these last pairs are sporadic (e.g. cl. 1, 3, 7, 24, 38, 74, 101). The influence of the genes distance on their co-expression is noticeable to some extent also in S.cerevisiae [36] but decays more rapidly than in E.coli (see Fig. 4(c)). While the decay/distance ratio is similar on the cDNA and Affymetrix datasets, for contiguous genes the former is unable to distinguish strain specific genes.

Bottom Line: For the first organism the pattern of co-expression is shown to reflect in fine detail both the operonal structure of the DNA and the regulatory effects exerted by the gene products when co-participating in a protein complex.The gene co-expression patterns deduced from compendia of profiling experiments tend to unveil functional categories that are mainly associated to stable bindings rather than transient interactions.The inference power of this systematic analysis is substantially reduced when passing from E. coli to S. cerevisiae.

View Article: PubMed Central - PubMed

Affiliation: SISSA-ISAS, International School for Advanced Studies, Trieste, Italy.

ABSTRACT

Background: The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from compendia of high throughput microarray data has been extensively used in the last few years to deduce/integrate/validate various types of "physical" networks of interactions among genes or gene products.

Results: This paper gives a comprehensive overview of which of these networks emerge significantly when reverse engineering large collections of gene expression data for two model organisms, E. coli and S. cerevisiae, without any prior information. For the first organism the pattern of co-expression is shown to reflect in fine detail both the operonal structure of the DNA and the regulatory effects exerted by the gene products when co-participating in a protein complex. For the second organism we find that direct transcriptional control (e.g., transcription factor-binding site interactions) has little statistical significance in comparison to the other regulatory mechanisms (such as co-sharing a protein complex, co-localization on a metabolic pathway or compartment), which are however resolved at a lower level of detail than in E. coli.

Conclusion: The gene co-expression patterns deduced from compendia of profiling experiments tend to unveil functional categories that are mainly associated to stable bindings rather than transient interactions. The inference power of this systematic analysis is substantially reduced when passing from E. coli to S. cerevisiae. This extensive analysis provides a way to describe the different complexity between the two organisms and discusses the critical limitations affecting this type of methodologies.

Show MeSH
Related in: MedlinePlus