Limits...
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

View Article: PubMed Central - PubMed

ABSTRACT

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.

No MeSH data available.


Related in: MedlinePlus

Comparison of E. coli Atomic Regulons vs. all other ARs for Shewanella oneidensis MR-1, Pseudomonas aeruginosa PAO1, Thermus thermophilus HB8 and Staphylococcus aureus subsp. aureus Mu50. (A) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. (B) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. Jaccard coefficients computed for each E. coli AR across all combinations of four, three, and two genomes.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121216&req=5

Figure 5: Comparison of E. coli Atomic Regulons vs. all other ARs for Shewanella oneidensis MR-1, Pseudomonas aeruginosa PAO1, Thermus thermophilus HB8 and Staphylococcus aureus subsp. aureus Mu50. (A) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. (B) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. Jaccard coefficients computed for each E. coli AR across all combinations of four, three, and two genomes.

Mentions: In order to compare the ARs inferred for the different organisms, Jaccard coefficients were computed for each AR in E. coli vs. all ARs in each of the four other genomes. The distribution of these computed coefficients for each genome analyzed reveals that regulation appears to be more similar in genomes that are closer to E. coli both phylogenetically and in terms of lifestyle (Figure 5A). The only Gram positive genome we included in our study, S. aureus, was, as expected, a distant genome in terms of AR variation. The most distant genome in terms of AR variation was T. thermophilus, which, despite being “Gram negative,” is phylogenetically distant and found in environments that are highly distinct from that of E. coli. Another interesting observation is how much variation exists in AR content between E. coli and the closest genomes analyzed, S. oneidensis and P. aeruginosa. Although these genomes are close to E. coli phylogenetically, only a small fraction of their atomic regulons have high compositional similarity. In contrast to expectations, the composition of co-regulated gene sets appears to be highly variable among closely related organisms. This result could support the notion that regulation is a highly adaptable system in the cell, but experimental studies specifically dedicated to this type of comparative analysis are needed in order to confirm this result.


Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Comparison of E. coli Atomic Regulons vs. all other ARs for Shewanella oneidensis MR-1, Pseudomonas aeruginosa PAO1, Thermus thermophilus HB8 and Staphylococcus aureus subsp. aureus Mu50. (A) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. (B) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. Jaccard coefficients computed for each E. coli AR across all combinations of four, three, and two genomes.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121216&req=5

Figure 5: Comparison of E. coli Atomic Regulons vs. all other ARs for Shewanella oneidensis MR-1, Pseudomonas aeruginosa PAO1, Thermus thermophilus HB8 and Staphylococcus aureus subsp. aureus Mu50. (A) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. (B) The % of similarity is given by the Jaccard coefficient, which is defined as the size of the intersection divided by the size of the union of the sample sets. Jaccard coefficients computed for each E. coli AR across all combinations of four, three, and two genomes.
Mentions: In order to compare the ARs inferred for the different organisms, Jaccard coefficients were computed for each AR in E. coli vs. all ARs in each of the four other genomes. The distribution of these computed coefficients for each genome analyzed reveals that regulation appears to be more similar in genomes that are closer to E. coli both phylogenetically and in terms of lifestyle (Figure 5A). The only Gram positive genome we included in our study, S. aureus, was, as expected, a distant genome in terms of AR variation. The most distant genome in terms of AR variation was T. thermophilus, which, despite being “Gram negative,” is phylogenetically distant and found in environments that are highly distinct from that of E. coli. Another interesting observation is how much variation exists in AR content between E. coli and the closest genomes analyzed, S. oneidensis and P. aeruginosa. Although these genomes are close to E. coli phylogenetically, only a small fraction of their atomic regulons have high compositional similarity. In contrast to expectations, the composition of co-regulated gene sets appears to be highly variable among closely related organisms. This result could support the notion that regulation is a highly adaptable system in the cell, but experimental studies specifically dedicated to this type of comparative analysis are needed in order to confirm this result.

View Article: PubMed Central - PubMed

ABSTRACT

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.

No MeSH data available.


Related in: MedlinePlus