Limits...
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

View Article: PubMed Central - PubMed

ABSTRACT

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.

No MeSH data available.


Related in: MedlinePlus

Sensitivity analysis of Atomic Regulon inference for Escherichia coli K-12. (A) Average number of genes in atomic regulons. (B) Average number of atomic regulons. (C) Average number of genes always ON (D) Average number of genes always OFF. Standard deviation error bars represent the variation across 100 data set randomizations from random sampling of experiments.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5121216&req=5

Figure 4: Sensitivity analysis of Atomic Regulon inference for Escherichia coli K-12. (A) Average number of genes in atomic regulons. (B) Average number of atomic regulons. (C) Average number of genes always ON (D) Average number of genes always OFF. Standard deviation error bars represent the variation across 100 data set randomizations from random sampling of experiments.

Mentions: The results of the random sensitivity analysis support the expectations. As the amount of available data increases, the number of genes in ARs (Figure 4A) and the total number of ARs increase (Figure 4B). Additionally, the numbers of always ON genes (Figure 4C) and always OFF genes (Figure 4D) decrease with increasing amounts of expression data. Interestingly, in all cases, large improvements in each of the metrics are observed as the amount of data used increases from 10 to ~60% of the available data. Continued improvements in all metrics are observed until 100% of the data is used, but the improvements grow markedly smaller as more than 60% of the data is considered. We also performed a simple 2-fold cross validation of the data, randomly splitting the 907 experiments for E. coli into equal non-overlapping sets (Table 1). When we compare the regulons computed from these two data subsets, the average Jaccard coefficients (0.83 ± 0.31 and 0.80 ± 0.35, mean and standard deviation) were nearly identical to the comparison of atomic regulons computed from the full dataset to either of the subsets (0.81 ± 0.35 and 0.80 ± 0.37 respectively). This result shows that the atomic regulons are very similar when only half the available experimental data is used.


Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Sensitivity analysis of Atomic Regulon inference for Escherichia coli K-12. (A) Average number of genes in atomic regulons. (B) Average number of atomic regulons. (C) Average number of genes always ON (D) Average number of genes always OFF. Standard deviation error bars represent the variation across 100 data set randomizations from random sampling of experiments.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5121216&req=5

Figure 4: Sensitivity analysis of Atomic Regulon inference for Escherichia coli K-12. (A) Average number of genes in atomic regulons. (B) Average number of atomic regulons. (C) Average number of genes always ON (D) Average number of genes always OFF. Standard deviation error bars represent the variation across 100 data set randomizations from random sampling of experiments.
Mentions: The results of the random sensitivity analysis support the expectations. As the amount of available data increases, the number of genes in ARs (Figure 4A) and the total number of ARs increase (Figure 4B). Additionally, the numbers of always ON genes (Figure 4C) and always OFF genes (Figure 4D) decrease with increasing amounts of expression data. Interestingly, in all cases, large improvements in each of the metrics are observed as the amount of data used increases from 10 to ~60% of the available data. Continued improvements in all metrics are observed until 100% of the data is used, but the improvements grow markedly smaller as more than 60% of the data is considered. We also performed a simple 2-fold cross validation of the data, randomly splitting the 907 experiments for E. coli into equal non-overlapping sets (Table 1). When we compare the regulons computed from these two data subsets, the average Jaccard coefficients (0.83 ± 0.31 and 0.80 ± 0.35, mean and standard deviation) were nearly identical to the comparison of atomic regulons computed from the full dataset to either of the subsets (0.81 ± 0.35 and 0.80 ± 0.37 respectively). This result shows that the atomic regulons are very similar when only half the available experimental data is used.

View Article: PubMed Central - PubMed

ABSTRACT

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain.

No MeSH data available.


Related in: MedlinePlus