Limits...
A predictive model of the oxygen and heme regulatory network in yeast.

Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C - PLoS Comput. Biol. (2008)

Bottom Line: We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network.In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation.Supplemental data are included.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Columbia University, New York, New York, United States of America.

ABSTRACT
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.

Show MeSH

Related in: MedlinePlus

Comparison of significance and abundance of motifs learned by MEDUSA and AlignACE for the 16 expression signatures identified in the dataset.Each row in the table represents a motif found by MEDUSA only (top section), by both MEDUSA and AlignACE (middle section), or by AlignACE only (bottom section). The first column describes the motif by the name of the corresponding transcription factor followed by the consensus motif sequence. Some transcription factor names are followed by ‘ChIP’, indicating that these are significant ChIP-chip occupancy features identified by MEDUSA. Motif descriptions highlighted in red indicate transcription factors that are specifically known to have an important function in hypoxia. The remainder of the table shows MEDUSA (left section) and AlignACE (right section) results for each signature (S1 to S16), represented by a pair of columns scoring motifs by statistical significance (left column in each pair) and abundance within the set of genes making up the signature (right column in each pair). For statistical significance scores, columns labeled ‘S’ represent the margin scores (in shades of blue) assigned by MEDUSA, and columns labeled ‘M’ represent the maximum a posteriori (MAP) scores (in shades of green) assigned by AlignACE. In both cases, dark shades indicate higher statistical significance. The columns labeled ‘A’ show the percentage abundance scores of the motifs in each of the signatures. For AlignACE, the abundance score of a motif simply reflects the ratio of the number of genes in each cluster whose promoters contain the motif, to the cluster size. For MEDUSA, it refers to the ratio of the number of genes in each cluster for which the motif contributes positively to the margin score, to the size of the cluster. A motif could be present in the promoter of a gene but not identified as significant by MEDUSA. In such cases, the motif does not contribute to the MEDUSA abundance score. Dark shades of pink indicate strong abundance scores.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2573020&req=5

pcbi-1000224-g008: Comparison of significance and abundance of motifs learned by MEDUSA and AlignACE for the 16 expression signatures identified in the dataset.Each row in the table represents a motif found by MEDUSA only (top section), by both MEDUSA and AlignACE (middle section), or by AlignACE only (bottom section). The first column describes the motif by the name of the corresponding transcription factor followed by the consensus motif sequence. Some transcription factor names are followed by ‘ChIP’, indicating that these are significant ChIP-chip occupancy features identified by MEDUSA. Motif descriptions highlighted in red indicate transcription factors that are specifically known to have an important function in hypoxia. The remainder of the table shows MEDUSA (left section) and AlignACE (right section) results for each signature (S1 to S16), represented by a pair of columns scoring motifs by statistical significance (left column in each pair) and abundance within the set of genes making up the signature (right column in each pair). For statistical significance scores, columns labeled ‘S’ represent the margin scores (in shades of blue) assigned by MEDUSA, and columns labeled ‘M’ represent the maximum a posteriori (MAP) scores (in shades of green) assigned by AlignACE. In both cases, dark shades indicate higher statistical significance. The columns labeled ‘A’ show the percentage abundance scores of the motifs in each of the signatures. For AlignACE, the abundance score of a motif simply reflects the ratio of the number of genes in each cluster whose promoters contain the motif, to the cluster size. For MEDUSA, it refers to the ratio of the number of genes in each cluster for which the motif contributes positively to the margin score, to the size of the cluster. A motif could be present in the promoter of a gene but not identified as significant by MEDUSA. In such cases, the motif does not contribute to the MEDUSA abundance score. Dark shades of pink indicate strong abundance scores.

Mentions: Figure 8 shows a comprehensive comparison of MEDUSA to AlignACE motif discovery results across all 16 signatures. We used AlignACE with default settings on 1000 base pair promoter sequences of genes belonging to each signature and used AlignACE's maximum a posteriori (MAP) scores to rank their statistical significance. We also defined the abundance score for each motif as the fraction of promoters that were found to have the motif. Similarly, we used margin scoring to identify significant MEDUSA motifs for each of the signatures, reporting only those motifs with a positive margin score. For MEDUSA, we defined the abundance score for a motif as the fraction of promoters in the signature set that were found to have the motif based on the tree structure of the learned model. In order to compare the two methods, we report in Figure 8 only those motifs that matched known transcription factor binding sites PSSMs in TRANSFAC, SCPD or YPD (using Kullback-Leibler divergence to compare motifs [8]) or matched consensus sequences found by MacIsaac et al. [64]. (A separate comparison of MEDUSA motifs to MacIsaac et al. motifs alone appears in Figure S9.) If multiple motifs were found to be strong matches to the same known binding site, we reported the one with the highest statistical score. In total, we matched 111 motifs found by either or both methods to known binding sites, and we sorted these motifs into 3 categories based on the difference between the cumulative margin score and cumulative MAP score across all the signatures. The first set consists of 67 motifs identified by MEDUSA but not by AlignACE; the second set consists of 22 motifs that are identified by both MEDUSA and AlignACE; and the third set consists of 22 motifs that are identified by AlignACE but not MEDUSA. In Figure 8, the motifs highlighted in red are binding sites of transcription factors known to play a key role in hypoxia-related conditions. MEDUSA is able to identify a number important hypoxia-related transcription factor binding sites, including Hap1 (CGGnnTAnCGG), Hap2/3/4 (CCAAT), Mga2 (ACTCAACAA), Upc2/Ecm22 (TCGTATA), Ace2 (TGCTGGT), Mot3 (TTGCCT), Mac1 (TGCGCAAA), Aft2 (RVACCCTD), Msn2/4 (AAGGGGc), Rox1 (AAAGACAAAAAA) and Abf1 (RTCRnnnnnACG). Among these, AlignACE is able to identify Rox1, Msn4 and Abf1, and it finds the Hap1 and Hap2/3/4 binding sites only for a single signature (signature 16). Moreover, none of the motifs exclusively identified by AlignACE are known to have any role in the hypoxia-related conditions. In particular, the top scoring AlignACE motif is a low complexity motif (AAAAAAAA) that matches the Azf1 binding site. These results show that MEDUSA outperforms AlignACE in finding relevant sequence motifs for our dataset.


A predictive model of the oxygen and heme regulatory network in yeast.

Kundaje A, Xin X, Lan C, Lianoglou S, Zhou M, Zhang L, Leslie C - PLoS Comput. Biol. (2008)

Comparison of significance and abundance of motifs learned by MEDUSA and AlignACE for the 16 expression signatures identified in the dataset.Each row in the table represents a motif found by MEDUSA only (top section), by both MEDUSA and AlignACE (middle section), or by AlignACE only (bottom section). The first column describes the motif by the name of the corresponding transcription factor followed by the consensus motif sequence. Some transcription factor names are followed by ‘ChIP’, indicating that these are significant ChIP-chip occupancy features identified by MEDUSA. Motif descriptions highlighted in red indicate transcription factors that are specifically known to have an important function in hypoxia. The remainder of the table shows MEDUSA (left section) and AlignACE (right section) results for each signature (S1 to S16), represented by a pair of columns scoring motifs by statistical significance (left column in each pair) and abundance within the set of genes making up the signature (right column in each pair). For statistical significance scores, columns labeled ‘S’ represent the margin scores (in shades of blue) assigned by MEDUSA, and columns labeled ‘M’ represent the maximum a posteriori (MAP) scores (in shades of green) assigned by AlignACE. In both cases, dark shades indicate higher statistical significance. The columns labeled ‘A’ show the percentage abundance scores of the motifs in each of the signatures. For AlignACE, the abundance score of a motif simply reflects the ratio of the number of genes in each cluster whose promoters contain the motif, to the cluster size. For MEDUSA, it refers to the ratio of the number of genes in each cluster for which the motif contributes positively to the margin score, to the size of the cluster. A motif could be present in the promoter of a gene but not identified as significant by MEDUSA. In such cases, the motif does not contribute to the MEDUSA abundance score. Dark shades of pink indicate strong abundance scores.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2573020&req=5

pcbi-1000224-g008: Comparison of significance and abundance of motifs learned by MEDUSA and AlignACE for the 16 expression signatures identified in the dataset.Each row in the table represents a motif found by MEDUSA only (top section), by both MEDUSA and AlignACE (middle section), or by AlignACE only (bottom section). The first column describes the motif by the name of the corresponding transcription factor followed by the consensus motif sequence. Some transcription factor names are followed by ‘ChIP’, indicating that these are significant ChIP-chip occupancy features identified by MEDUSA. Motif descriptions highlighted in red indicate transcription factors that are specifically known to have an important function in hypoxia. The remainder of the table shows MEDUSA (left section) and AlignACE (right section) results for each signature (S1 to S16), represented by a pair of columns scoring motifs by statistical significance (left column in each pair) and abundance within the set of genes making up the signature (right column in each pair). For statistical significance scores, columns labeled ‘S’ represent the margin scores (in shades of blue) assigned by MEDUSA, and columns labeled ‘M’ represent the maximum a posteriori (MAP) scores (in shades of green) assigned by AlignACE. In both cases, dark shades indicate higher statistical significance. The columns labeled ‘A’ show the percentage abundance scores of the motifs in each of the signatures. For AlignACE, the abundance score of a motif simply reflects the ratio of the number of genes in each cluster whose promoters contain the motif, to the cluster size. For MEDUSA, it refers to the ratio of the number of genes in each cluster for which the motif contributes positively to the margin score, to the size of the cluster. A motif could be present in the promoter of a gene but not identified as significant by MEDUSA. In such cases, the motif does not contribute to the MEDUSA abundance score. Dark shades of pink indicate strong abundance scores.
Mentions: Figure 8 shows a comprehensive comparison of MEDUSA to AlignACE motif discovery results across all 16 signatures. We used AlignACE with default settings on 1000 base pair promoter sequences of genes belonging to each signature and used AlignACE's maximum a posteriori (MAP) scores to rank their statistical significance. We also defined the abundance score for each motif as the fraction of promoters that were found to have the motif. Similarly, we used margin scoring to identify significant MEDUSA motifs for each of the signatures, reporting only those motifs with a positive margin score. For MEDUSA, we defined the abundance score for a motif as the fraction of promoters in the signature set that were found to have the motif based on the tree structure of the learned model. In order to compare the two methods, we report in Figure 8 only those motifs that matched known transcription factor binding sites PSSMs in TRANSFAC, SCPD or YPD (using Kullback-Leibler divergence to compare motifs [8]) or matched consensus sequences found by MacIsaac et al. [64]. (A separate comparison of MEDUSA motifs to MacIsaac et al. motifs alone appears in Figure S9.) If multiple motifs were found to be strong matches to the same known binding site, we reported the one with the highest statistical score. In total, we matched 111 motifs found by either or both methods to known binding sites, and we sorted these motifs into 3 categories based on the difference between the cumulative margin score and cumulative MAP score across all the signatures. The first set consists of 67 motifs identified by MEDUSA but not by AlignACE; the second set consists of 22 motifs that are identified by both MEDUSA and AlignACE; and the third set consists of 22 motifs that are identified by AlignACE but not MEDUSA. In Figure 8, the motifs highlighted in red are binding sites of transcription factors known to play a key role in hypoxia-related conditions. MEDUSA is able to identify a number important hypoxia-related transcription factor binding sites, including Hap1 (CGGnnTAnCGG), Hap2/3/4 (CCAAT), Mga2 (ACTCAACAA), Upc2/Ecm22 (TCGTATA), Ace2 (TGCTGGT), Mot3 (TTGCCT), Mac1 (TGCGCAAA), Aft2 (RVACCCTD), Msn2/4 (AAGGGGc), Rox1 (AAAGACAAAAAA) and Abf1 (RTCRnnnnnACG). Among these, AlignACE is able to identify Rox1, Msn4 and Abf1, and it finds the Hap1 and Hap2/3/4 binding sites only for a single signature (signature 16). Moreover, none of the motifs exclusively identified by AlignACE are known to have any role in the hypoxia-related conditions. In particular, the top scoring AlignACE motif is a low complexity motif (AAAAAAAA) that matches the Azf1 binding site. These results show that MEDUSA outperforms AlignACE in finding relevant sequence motifs for our dataset.

Bottom Line: We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network.In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation.Supplemental data are included.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Columbia University, New York, New York, United States of America.

ABSTRACT
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.

Show MeSH
Related in: MedlinePlus