Limits...
A predictive model of the oxygen and heme regulatory network in yeast.

Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C - PLoS Comput. Biol. (2008)

Bottom Line: We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network.In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation.Supplemental data are included.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Columbia University, New York, New York, United States of America.

ABSTRACT
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.

Show MeSH

Related in: MedlinePlus

Pseudocode for the MEDUSA learning algorithm.The figure gives detailed pseudocode for the core MEDUSA algorithm which learns DNA motifs de novo from promoter sequences and assembles motifs and regulators into an alternating decision tree (ADT) for predicting up/down regulation of target genes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2573020&req=5

pcbi-1000224-g009: Pseudocode for the MEDUSA learning algorithm.The figure gives detailed pseudocode for the core MEDUSA algorithm which learns DNA motifs de novo from promoter sequences and assembles motifs and regulators into an alternating decision tree (ADT) for predicting up/down regulation of target genes.

Mentions: Detailed pseudocode for the core MEDUSA algorithm is given in Figure 9. Briefly, each iteration t of boosting adds a new decision node—corresponding to a binding site motif μ, coupled with a regulator ρ whose state s (either up or down) helps predict up/down regulation of target genes—and a prediction node to a gene regulation model described by an ADT. Each motif is either a k-length sequence (“k-mer”), a dimer, or a PSSM. The weak rule ht defined by the decision node depends both on the motif-regulator condition that it tests and on the position at which it is placed in the ADT. We define the precondition c1 to be the conjunction of conditions in decision nodes along the path to the existing prediction node under which the new decision node is placed, and we write c2 = c2 (μ, ρ, s) as the condition tested in the new decision node. Then the corresponding weak rule is ht = [c1 ∧ c2], and the prediction node value αt can be computed from the weight of the correct and incorrect training predictions made by ht (see Figure 9). The motif μ added at iteration t is learned in two stages. First, the algorithm considers all deterministic motifs (k-mers or dimers) μd and optimizes boosting loss over choices of preconditions c1 and new conditions c2 = c2 (μ, ρ, s), yielding optimal precondition , regulator and state s*. Second, candidate PSSMs are generated by considering the top-ranked deterministic motifs generated in the first stage and performing hierarchical agglomeration (see Figure 9). Optimizing boosting loss over candidate PSSMs and choices of thresholds for the log-odds score for each of these PSSMs yields an optimal probabilistic motif and threshold θ*. This motif is used for the new decision and prediction nodes if its loss is better than the best deterministic motif .


A predictive model of the oxygen and heme regulatory network in yeast.

Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C - PLoS Comput. Biol. (2008)

Pseudocode for the MEDUSA learning algorithm.The figure gives detailed pseudocode for the core MEDUSA algorithm which learns DNA motifs de novo from promoter sequences and assembles motifs and regulators into an alternating decision tree (ADT) for predicting up/down regulation of target genes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2573020&req=5

pcbi-1000224-g009: Pseudocode for the MEDUSA learning algorithm.The figure gives detailed pseudocode for the core MEDUSA algorithm which learns DNA motifs de novo from promoter sequences and assembles motifs and regulators into an alternating decision tree (ADT) for predicting up/down regulation of target genes.
Mentions: Detailed pseudocode for the core MEDUSA algorithm is given in Figure 9. Briefly, each iteration t of boosting adds a new decision node—corresponding to a binding site motif μ, coupled with a regulator ρ whose state s (either up or down) helps predict up/down regulation of target genes—and a prediction node to a gene regulation model described by an ADT. Each motif is either a k-length sequence (“k-mer”), a dimer, or a PSSM. The weak rule ht defined by the decision node depends both on the motif-regulator condition that it tests and on the position at which it is placed in the ADT. We define the precondition c1 to be the conjunction of conditions in decision nodes along the path to the existing prediction node under which the new decision node is placed, and we write c2 = c2 (μ, ρ, s) as the condition tested in the new decision node. Then the corresponding weak rule is ht = [c1 ∧ c2], and the prediction node value αt can be computed from the weight of the correct and incorrect training predictions made by ht (see Figure 9). The motif μ added at iteration t is learned in two stages. First, the algorithm considers all deterministic motifs (k-mers or dimers) μd and optimizes boosting loss over choices of preconditions c1 and new conditions c2 = c2 (μ, ρ, s), yielding optimal precondition , regulator and state s*. Second, candidate PSSMs are generated by considering the top-ranked deterministic motifs generated in the first stage and performing hierarchical agglomeration (see Figure 9). Optimizing boosting loss over candidate PSSMs and choices of thresholds for the log-odds score for each of these PSSMs yields an optimal probabilistic motif and threshold θ*. This motif is used for the new decision and prediction nodes if its loss is better than the best deterministic motif .

Bottom Line: We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network.In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation.Supplemental data are included.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Columbia University, New York, New York, United States of America.

ABSTRACT
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.

Show MeSH
Related in: MedlinePlus