Limits...
Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana.

Redestig H, Weicht D, Selbig J, Hannah MA - BMC Bioinformatics (2007)

Bottom Line: Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data.We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam-Golm, Germany. redestig@mpimp-golm.mpg.de

ABSTRACT

Background: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation.

Results: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.

Conclusion: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

Show MeSH
The PAP regulon. Comparison of the expression of the CERMT predicted regulon (upper panel) and the over-expression defined regulon [47] (lower panel) versus the expression of the PAP1 transcription factor in the shoot in response to salt and osmotic stress. No time lag was used for the prediction, so there is no overlap between predicted and true regulons. The difference in terms of coherency and variance is pronounced so it is not hard to see why the algorithm is seeded with no time lag instead of the more appropriate lag of two time points. This illustrates an unavoidable problem of TF target prediction based only on gene expression data – there are no unique solutions and the most obvious solution is not necessarily the correct one.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2198923&req=5

Figure 5: The PAP regulon. Comparison of the expression of the CERMT predicted regulon (upper panel) and the over-expression defined regulon [47] (lower panel) versus the expression of the PAP1 transcription factor in the shoot in response to salt and osmotic stress. No time lag was used for the prediction, so there is no overlap between predicted and true regulons. The difference in terms of coherency and variance is pronounced so it is not hard to see why the algorithm is seeded with no time lag instead of the more appropriate lag of two time points. This illustrates an unavoidable problem of TF target prediction based only on gene expression data – there are no unique solutions and the most obvious solution is not necessarily the correct one.

Mentions: One of the key benefits of the increasing public availability of expression data is the ability to quickly generate hypotheses on gene function. Standard co-expression analyses have yielded several insights that were experimentally validated [19,20]. We therefore investigated the functional insight provided by the CERMT predicted CBF regulon. Remarkably, among the top seven genes there are four COR/LEA genes and one galactinol synthase. The cold-regulated (COR) genes are the defining members of the CBF-regulon as the CBF TFs were first identified through their binding to the C-repeat element present in the promoters of these genes [45]. Galactinol synthase catalyzes the first committed step of raffinose synthesis which is an important component of cold acclimation known to be under the control of the CBF TFs [45]. Overall, the predicted regulon reveals many more known CBF targets including further cold responsive COR genes, enzymes and TFs. These data clearly offer significant biological insight into the central function of the CBF TFs in controlling transcriptional and metabolic changes during cold acclimation. In addition to the predicted target lists, the information about the used treatments shown in Table 2 can also provide useful biological insight into the function of the TF and of the predicted regulon. Several of the studied examples verify known biological information such as CBF's and ZAT12's importance for the response to cold, HY5's for UV, HSFA2's for heat and MBF1c's for heat and osmotic stress [12,39,41,46]. Table 2 also shows which time shifts were used for each TF along with the time shift for which the median covariance between the TF and all the genes in its regulon is maximized in the used treatments. This can be seen as a supervised 'answer' to what the algorithm is trying to predict. It is clear that there often exists a transcriptional time shift for the studied regulons, which justifies one of our primary assumptions. However, the correct time lag is frequently missed by the algorithm. The reason for this becomes apparent when one considers the plots of the over-expression defined regulon for PAP1 and the first 50 genes in the predicted regulon for PAP1, see Figure 5. The difference is glaring so it is not surprising that the true regulon is overlooked. In order to increase performance it would be necessary to use additional resources rather than the gene expression data alone. By, for example, using the information that the deep purple phenotype of the PAP1 over-expresser is due to anthocyanin accumulation and therefore only consider genes involved in flavonoid metabolism [47]. When this information is combined with the a priori assumption that there exists a time shift, the algorithm picks out nine of the true targets in its top 100 (Fisher's exact test: P = 10-5). Including such additional data therefore adds one more TF to those whose hit ratio is sufficient to deliver a usable number of high-confidence target genes. This illustrates an unavoidable problem with gene expression data for TF-target prediction; there are no unique solutions. Given these data sets however, we draw the conclusion that the true regulons often, but far from always, can be discovered with simple statistical functions thus conceptually strengthening the approach by Beyer et al. [10] which integrates many different techniques to boost target predictions.


Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana.

Redestig H, Weicht D, Selbig J, Hannah MA - BMC Bioinformatics (2007)

The PAP regulon. Comparison of the expression of the CERMT predicted regulon (upper panel) and the over-expression defined regulon [47] (lower panel) versus the expression of the PAP1 transcription factor in the shoot in response to salt and osmotic stress. No time lag was used for the prediction, so there is no overlap between predicted and true regulons. The difference in terms of coherency and variance is pronounced so it is not hard to see why the algorithm is seeded with no time lag instead of the more appropriate lag of two time points. This illustrates an unavoidable problem of TF target prediction based only on gene expression data – there are no unique solutions and the most obvious solution is not necessarily the correct one.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2198923&req=5

Figure 5: The PAP regulon. Comparison of the expression of the CERMT predicted regulon (upper panel) and the over-expression defined regulon [47] (lower panel) versus the expression of the PAP1 transcription factor in the shoot in response to salt and osmotic stress. No time lag was used for the prediction, so there is no overlap between predicted and true regulons. The difference in terms of coherency and variance is pronounced so it is not hard to see why the algorithm is seeded with no time lag instead of the more appropriate lag of two time points. This illustrates an unavoidable problem of TF target prediction based only on gene expression data – there are no unique solutions and the most obvious solution is not necessarily the correct one.
Mentions: One of the key benefits of the increasing public availability of expression data is the ability to quickly generate hypotheses on gene function. Standard co-expression analyses have yielded several insights that were experimentally validated [19,20]. We therefore investigated the functional insight provided by the CERMT predicted CBF regulon. Remarkably, among the top seven genes there are four COR/LEA genes and one galactinol synthase. The cold-regulated (COR) genes are the defining members of the CBF-regulon as the CBF TFs were first identified through their binding to the C-repeat element present in the promoters of these genes [45]. Galactinol synthase catalyzes the first committed step of raffinose synthesis which is an important component of cold acclimation known to be under the control of the CBF TFs [45]. Overall, the predicted regulon reveals many more known CBF targets including further cold responsive COR genes, enzymes and TFs. These data clearly offer significant biological insight into the central function of the CBF TFs in controlling transcriptional and metabolic changes during cold acclimation. In addition to the predicted target lists, the information about the used treatments shown in Table 2 can also provide useful biological insight into the function of the TF and of the predicted regulon. Several of the studied examples verify known biological information such as CBF's and ZAT12's importance for the response to cold, HY5's for UV, HSFA2's for heat and MBF1c's for heat and osmotic stress [12,39,41,46]. Table 2 also shows which time shifts were used for each TF along with the time shift for which the median covariance between the TF and all the genes in its regulon is maximized in the used treatments. This can be seen as a supervised 'answer' to what the algorithm is trying to predict. It is clear that there often exists a transcriptional time shift for the studied regulons, which justifies one of our primary assumptions. However, the correct time lag is frequently missed by the algorithm. The reason for this becomes apparent when one considers the plots of the over-expression defined regulon for PAP1 and the first 50 genes in the predicted regulon for PAP1, see Figure 5. The difference is glaring so it is not surprising that the true regulon is overlooked. In order to increase performance it would be necessary to use additional resources rather than the gene expression data alone. By, for example, using the information that the deep purple phenotype of the PAP1 over-expresser is due to anthocyanin accumulation and therefore only consider genes involved in flavonoid metabolism [47]. When this information is combined with the a priori assumption that there exists a time shift, the algorithm picks out nine of the true targets in its top 100 (Fisher's exact test: P = 10-5). Including such additional data therefore adds one more TF to those whose hit ratio is sufficient to deliver a usable number of high-confidence target genes. This illustrates an unavoidable problem with gene expression data for TF-target prediction; there are no unique solutions. Given these data sets however, we draw the conclusion that the true regulons often, but far from always, can be discovered with simple statistical functions thus conceptually strengthening the approach by Beyer et al. [10] which integrates many different techniques to boost target predictions.

Bottom Line: Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data.We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam-Golm, Germany. redestig@mpimp-golm.mpg.de

ABSTRACT

Background: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation.

Results: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.

Conclusion: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

Show MeSH