Limits...
A semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli.

Ernst J, Beg QK, Kay KA, Balázsi G, Oltvai ZN, Bar-Joseph Z - PLoS Comput. Biol. (2008)

Bottom Line: To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli.We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response.The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic-anaerobic shift interface.

View Article: PubMed Central - PubMed

Affiliation: Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.

ABSTRACT
While Escherichia coli has one of the most comprehensive datasets of experimentally verified transcriptional regulatory interactions of any organism, it is still far from complete. This presents a problem when trying to combine gene expression and regulatory interactions to model transcriptional regulatory networks. Using the available regulatory interactions to predict new interactions may lead to better coverage and more accurate models. Here, we develop SEREND (SEmi-supervised REgulatory Network Discoverer), a semi-supervised learning method that uses a curated database of verified transcriptional factor-gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor-gene interactions, including whether the transcription factor activates or represses the gene. Using genome-wide binding datasets for several transcription factors, we demonstrate that our semi-supervised classification strategy improves the prediction of targets for a given transcription factor. To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli. We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response. The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic-anaerobic shift interface.

Show MeSH

Related in: MedlinePlus

Transcription factor target set agreement between predicted and curated targets.The average expression values for TF regulatory modes (TF and activator or repressor relationship) among curated and new predicted targets at the 55-min time point of the new aerobic–anaerobic shift gene expression data are shown. Only the top 20 TF regulatory modes in terms of the number of new predictions are included. We excluded genes with dual annotations from the curated averages. We included genes in the predicted set averages for which we had a new prediction with regards to the mode of interaction (either because they were dual-annotated or SEREND predicted the opposite mode; this generally was for a small number of genes; see Table S1). For each TF regulatory mode, the graph also displays the 95% confidence interval based on 10,000 random draws of new predicted targets of the same size set. The graph shows that the average expression for a number of predicted TF target gene sets was significantly induced or repressed. The graph also shows a good agreement for most TF target gene sets between the curated and predicted sets, indicating the accuracy of the predictions.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2266799&req=5

pcbi-1000044-g004: Transcription factor target set agreement between predicted and curated targets.The average expression values for TF regulatory modes (TF and activator or repressor relationship) among curated and new predicted targets at the 55-min time point of the new aerobic–anaerobic shift gene expression data are shown. Only the top 20 TF regulatory modes in terms of the number of new predictions are included. We excluded genes with dual annotations from the curated averages. We included genes in the predicted set averages for which we had a new prediction with regards to the mode of interaction (either because they were dual-annotated or SEREND predicted the opposite mode; this generally was for a small number of genes; see Table S1). For each TF regulatory mode, the graph also displays the 95% confidence interval based on 10,000 random draws of new predicted targets of the same size set. The graph shows that the average expression for a number of predicted TF target gene sets was significantly induced or repressed. The graph also shows a good agreement for most TF target gene sets between the curated and predicted sets, indicating the accuracy of the predictions.

Mentions: To compare the set of interactions in the curated databases with the new targets predicted by SEREND, we first focused on expression values measured at the last sampled time point, 55 min after the shift from aerobic to anaerobic growth. Since these expression values were not used to generate our predictions they provide an unbiased test set for our predictions. We compared the average expression of the two sets of targets (curated and new predictions) for each TF activity mode (i.e., a factor and its influence as an activator or a repressor). In Figure 4, we plot the average expression of the two sets for the top 20 TF activity modes in terms of the number of new predictions (see Materials and Methods). We also plot a 95% confidence interval based on 10,000 randomizations for selecting sets of the same size as the new predictions (curated predictions confidence intervals were similar). Figure 4 illustrates a good agreement between the average expression of the curated targets and the newly predicted targets for this new expression dataset. We observe that the predicted and curated predictions completely agree on which are the top 8 most significantly upregulated gene sets and which are the top 5 most significantly downregulated gene sets. From Figure 4 we also observe that on average CRP, FNR, and IHF predicted activated targets had an induced expression level, while the predicted repressed targets had a repressed expression level.


A semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli.

Ernst J, Beg QK, Kay KA, Balázsi G, Oltvai ZN, Bar-Joseph Z - PLoS Comput. Biol. (2008)

Transcription factor target set agreement between predicted and curated targets.The average expression values for TF regulatory modes (TF and activator or repressor relationship) among curated and new predicted targets at the 55-min time point of the new aerobic–anaerobic shift gene expression data are shown. Only the top 20 TF regulatory modes in terms of the number of new predictions are included. We excluded genes with dual annotations from the curated averages. We included genes in the predicted set averages for which we had a new prediction with regards to the mode of interaction (either because they were dual-annotated or SEREND predicted the opposite mode; this generally was for a small number of genes; see Table S1). For each TF regulatory mode, the graph also displays the 95% confidence interval based on 10,000 random draws of new predicted targets of the same size set. The graph shows that the average expression for a number of predicted TF target gene sets was significantly induced or repressed. The graph also shows a good agreement for most TF target gene sets between the curated and predicted sets, indicating the accuracy of the predictions.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2266799&req=5

pcbi-1000044-g004: Transcription factor target set agreement between predicted and curated targets.The average expression values for TF regulatory modes (TF and activator or repressor relationship) among curated and new predicted targets at the 55-min time point of the new aerobic–anaerobic shift gene expression data are shown. Only the top 20 TF regulatory modes in terms of the number of new predictions are included. We excluded genes with dual annotations from the curated averages. We included genes in the predicted set averages for which we had a new prediction with regards to the mode of interaction (either because they were dual-annotated or SEREND predicted the opposite mode; this generally was for a small number of genes; see Table S1). For each TF regulatory mode, the graph also displays the 95% confidence interval based on 10,000 random draws of new predicted targets of the same size set. The graph shows that the average expression for a number of predicted TF target gene sets was significantly induced or repressed. The graph also shows a good agreement for most TF target gene sets between the curated and predicted sets, indicating the accuracy of the predictions.
Mentions: To compare the set of interactions in the curated databases with the new targets predicted by SEREND, we first focused on expression values measured at the last sampled time point, 55 min after the shift from aerobic to anaerobic growth. Since these expression values were not used to generate our predictions they provide an unbiased test set for our predictions. We compared the average expression of the two sets of targets (curated and new predictions) for each TF activity mode (i.e., a factor and its influence as an activator or a repressor). In Figure 4, we plot the average expression of the two sets for the top 20 TF activity modes in terms of the number of new predictions (see Materials and Methods). We also plot a 95% confidence interval based on 10,000 randomizations for selecting sets of the same size as the new predictions (curated predictions confidence intervals were similar). Figure 4 illustrates a good agreement between the average expression of the curated targets and the newly predicted targets for this new expression dataset. We observe that the predicted and curated predictions completely agree on which are the top 8 most significantly upregulated gene sets and which are the top 5 most significantly downregulated gene sets. From Figure 4 we also observe that on average CRP, FNR, and IHF predicted activated targets had an induced expression level, while the predicted repressed targets had a repressed expression level.

Bottom Line: To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli.We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response.The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic-anaerobic shift interface.

View Article: PubMed Central - PubMed

Affiliation: Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.

ABSTRACT
While Escherichia coli has one of the most comprehensive datasets of experimentally verified transcriptional regulatory interactions of any organism, it is still far from complete. This presents a problem when trying to combine gene expression and regulatory interactions to model transcriptional regulatory networks. Using the available regulatory interactions to predict new interactions may lead to better coverage and more accurate models. Here, we develop SEREND (SEmi-supervised REgulatory Network Discoverer), a semi-supervised learning method that uses a curated database of verified transcriptional factor-gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor-gene interactions, including whether the transcription factor activates or represses the gene. Using genome-wide binding datasets for several transcription factors, we demonstrate that our semi-supervised classification strategy improves the prediction of targets for a given transcription factor. To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli. We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response. The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic-anaerobic shift interface.

Show MeSH
Related in: MedlinePlus