Limits...
A predictive model of the oxygen and heme regulatory network in yeast.

Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C - PLoS Comput. Biol. (2008)

Bottom Line: We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network.In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation.Supplemental data are included.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Columbia University, New York, New York, United States of America.

ABSTRACT
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.

Show MeSH

Related in: MedlinePlus

Simplified example showing how the regulatory program learned by MEDUSA predicts context-specific up/down gene expression.MEDUSA learns a global regulatory program described by an alternating decision tree. A simple regulatory program is shown in part A of the figure, along with the prediction it makes in two contexts, indicated as context B (top right) and context C (bottom right). The interaction between a regulator and a motif and the effect on targets is described by a decision node, which contains a logical condition to be tested, e.g., “Is regulator i up in the experiment and is motif i present in the promoter?”, and by the contribution that this motif/regulator pair makes to the up/down prediction of target gene expression if the logical condition is true, which is indicated by a colored bar. Contributions to upregulation of targets are shown in red and downregulation of targets in green. Combinatorial regulation is encoded by the tree structure: we obtain a prediction score for the up/down regulation of a target gene in a given experimental condition by starting at the root and recursively working downwards in the tree, seeing which prediction nodes are reachable by answering “yes” to logical conditions and summing all score contributions for the nodes visited. (Context B) In the first context, both Reg 2, a transcriptional activator, and Reg 1, a repressor, are expressed, and the promoter of gene A contains the motifs associated by the regulatory program to both these regulators. The regulatory program computes the prediction score by summing the larger contribution of the repressor (green bar) with the smaller contribution of the activator (red bar) to obtain a negative prediction score (indicated by the dashed line on the far right), i.e., gene A is predicted to be downregulated. (Context C) In the second context, both the activator Reg 2 and a co-factor, Reg 3, are expressed and can bind to the promoter of gene B based on the presence of the associated motifs in the regulatory program. The logic of the tree requires that the condition involving Reg 2 must hold before the contribution of the node containing Reg 3, at the next level of the tree, can be counted. Here, both conditions hold, and the regulatory program adds two positive contributions to obtain a confident prediction that gene B will be upregulated.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2573020&req=5

pcbi-1000224-g003: Simplified example showing how the regulatory program learned by MEDUSA predicts context-specific up/down gene expression.MEDUSA learns a global regulatory program described by an alternating decision tree. A simple regulatory program is shown in part A of the figure, along with the prediction it makes in two contexts, indicated as context B (top right) and context C (bottom right). The interaction between a regulator and a motif and the effect on targets is described by a decision node, which contains a logical condition to be tested, e.g., “Is regulator i up in the experiment and is motif i present in the promoter?”, and by the contribution that this motif/regulator pair makes to the up/down prediction of target gene expression if the logical condition is true, which is indicated by a colored bar. Contributions to upregulation of targets are shown in red and downregulation of targets in green. Combinatorial regulation is encoded by the tree structure: we obtain a prediction score for the up/down regulation of a target gene in a given experimental condition by starting at the root and recursively working downwards in the tree, seeing which prediction nodes are reachable by answering “yes” to logical conditions and summing all score contributions for the nodes visited. (Context B) In the first context, both Reg 2, a transcriptional activator, and Reg 1, a repressor, are expressed, and the promoter of gene A contains the motifs associated by the regulatory program to both these regulators. The regulatory program computes the prediction score by summing the larger contribution of the repressor (green bar) with the smaller contribution of the activator (red bar) to obtain a negative prediction score (indicated by the dashed line on the far right), i.e., gene A is predicted to be downregulated. (Context C) In the second context, both the activator Reg 2 and a co-factor, Reg 3, are expressed and can bind to the promoter of gene B based on the presence of the associated motifs in the regulatory program. The logic of the tree requires that the condition involving Reg 2 must hold before the contribution of the node containing Reg 3, at the next level of the tree, can be counted. Here, both conditions hold, and the regulatory program adds two positive contributions to obtain a confident prediction that gene B will be upregulated.

Mentions: Figure 2 illustrates the major steps and data used in the MEDUSA learning algorithm. In preprocessing, mRNA expression data is discretized by binning expression values into three states (up, down, and baseline) and partitioning genes into regulators and targets (Figure 2A). In the first stage of training (Figure 2B and 2C), MEDUSA uses the promoter sequences of target genes and the mRNA levels of regulators as inputs to learn a prediction function for the differential expression of targets. MEDUSA uses boosting to iteratively discover motifs whose presence in the promoters of target genes, together with the mRNA levels of regulators across experimental conditions, helps to predict the differential expression of the targets in those conditions. It builds a global regulatory program based on these motifs and regulators (Figure 2D). In order to produce a regulatory program that is more consistent under variations of the training data, a second pass of the regulatory program building algorithm is performed using a stabilized variant of boosting (see Methods). This second pass integrates the motifs discovered in the first training stage, promoter occupancy data from ChIP-chip analysis, and expression data of regulators and targets, to build a final regulatory program that predicts the up or down regulation of target genes. The regulatory program asks questions such as, “Is the mRNA level of regulator ρ up (or down) in the experiment, and is the motif μ present in the upstream region of the gene (or is a transcriptional regulator bound to the promoter, when ChIP-chip data are available)?” The control logic of the regulatory program is described by an alternating decision tree (Figure 2D and Figure 3), which encodes how the overall up or down prediction score for a target gene in a given experimental condition results from the contribution and interaction of multiple regulators and motifs.


A predictive model of the oxygen and heme regulatory network in yeast.

Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C - PLoS Comput. Biol. (2008)

Simplified example showing how the regulatory program learned by MEDUSA predicts context-specific up/down gene expression.MEDUSA learns a global regulatory program described by an alternating decision tree. A simple regulatory program is shown in part A of the figure, along with the prediction it makes in two contexts, indicated as context B (top right) and context C (bottom right). The interaction between a regulator and a motif and the effect on targets is described by a decision node, which contains a logical condition to be tested, e.g., “Is regulator i up in the experiment and is motif i present in the promoter?”, and by the contribution that this motif/regulator pair makes to the up/down prediction of target gene expression if the logical condition is true, which is indicated by a colored bar. Contributions to upregulation of targets are shown in red and downregulation of targets in green. Combinatorial regulation is encoded by the tree structure: we obtain a prediction score for the up/down regulation of a target gene in a given experimental condition by starting at the root and recursively working downwards in the tree, seeing which prediction nodes are reachable by answering “yes” to logical conditions and summing all score contributions for the nodes visited. (Context B) In the first context, both Reg 2, a transcriptional activator, and Reg 1, a repressor, are expressed, and the promoter of gene A contains the motifs associated by the regulatory program to both these regulators. The regulatory program computes the prediction score by summing the larger contribution of the repressor (green bar) with the smaller contribution of the activator (red bar) to obtain a negative prediction score (indicated by the dashed line on the far right), i.e., gene A is predicted to be downregulated. (Context C) In the second context, both the activator Reg 2 and a co-factor, Reg 3, are expressed and can bind to the promoter of gene B based on the presence of the associated motifs in the regulatory program. The logic of the tree requires that the condition involving Reg 2 must hold before the contribution of the node containing Reg 3, at the next level of the tree, can be counted. Here, both conditions hold, and the regulatory program adds two positive contributions to obtain a confident prediction that gene B will be upregulated.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2573020&req=5

pcbi-1000224-g003: Simplified example showing how the regulatory program learned by MEDUSA predicts context-specific up/down gene expression.MEDUSA learns a global regulatory program described by an alternating decision tree. A simple regulatory program is shown in part A of the figure, along with the prediction it makes in two contexts, indicated as context B (top right) and context C (bottom right). The interaction between a regulator and a motif and the effect on targets is described by a decision node, which contains a logical condition to be tested, e.g., “Is regulator i up in the experiment and is motif i present in the promoter?”, and by the contribution that this motif/regulator pair makes to the up/down prediction of target gene expression if the logical condition is true, which is indicated by a colored bar. Contributions to upregulation of targets are shown in red and downregulation of targets in green. Combinatorial regulation is encoded by the tree structure: we obtain a prediction score for the up/down regulation of a target gene in a given experimental condition by starting at the root and recursively working downwards in the tree, seeing which prediction nodes are reachable by answering “yes” to logical conditions and summing all score contributions for the nodes visited. (Context B) In the first context, both Reg 2, a transcriptional activator, and Reg 1, a repressor, are expressed, and the promoter of gene A contains the motifs associated by the regulatory program to both these regulators. The regulatory program computes the prediction score by summing the larger contribution of the repressor (green bar) with the smaller contribution of the activator (red bar) to obtain a negative prediction score (indicated by the dashed line on the far right), i.e., gene A is predicted to be downregulated. (Context C) In the second context, both the activator Reg 2 and a co-factor, Reg 3, are expressed and can bind to the promoter of gene B based on the presence of the associated motifs in the regulatory program. The logic of the tree requires that the condition involving Reg 2 must hold before the contribution of the node containing Reg 3, at the next level of the tree, can be counted. Here, both conditions hold, and the regulatory program adds two positive contributions to obtain a confident prediction that gene B will be upregulated.
Mentions: Figure 2 illustrates the major steps and data used in the MEDUSA learning algorithm. In preprocessing, mRNA expression data is discretized by binning expression values into three states (up, down, and baseline) and partitioning genes into regulators and targets (Figure 2A). In the first stage of training (Figure 2B and 2C), MEDUSA uses the promoter sequences of target genes and the mRNA levels of regulators as inputs to learn a prediction function for the differential expression of targets. MEDUSA uses boosting to iteratively discover motifs whose presence in the promoters of target genes, together with the mRNA levels of regulators across experimental conditions, helps to predict the differential expression of the targets in those conditions. It builds a global regulatory program based on these motifs and regulators (Figure 2D). In order to produce a regulatory program that is more consistent under variations of the training data, a second pass of the regulatory program building algorithm is performed using a stabilized variant of boosting (see Methods). This second pass integrates the motifs discovered in the first training stage, promoter occupancy data from ChIP-chip analysis, and expression data of regulators and targets, to build a final regulatory program that predicts the up or down regulation of target genes. The regulatory program asks questions such as, “Is the mRNA level of regulator ρ up (or down) in the experiment, and is the motif μ present in the upstream region of the gene (or is a transcriptional regulator bound to the promoter, when ChIP-chip data are available)?” The control logic of the regulatory program is described by an alternating decision tree (Figure 2D and Figure 3), which encodes how the overall up or down prediction score for a target gene in a given experimental condition results from the contribution and interaction of multiple regulators and motifs.

Bottom Line: We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network.In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation.Supplemental data are included.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Columbia University, New York, New York, United States of America.

ABSTRACT
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.

Show MeSH
Related in: MedlinePlus