Limits...
Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana.

Redestig H, Weicht D, Selbig J, Hannah MA - BMC Bioinformatics (2007)

Bottom Line: Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data.We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Plant Physiology, Am M├╝hlenberg 1, D-14476 Potsdam-Golm, Germany. redestig@mpimp-golm.mpg.de

ABSTRACT

Background: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation.

Results: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.

Conclusion: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

Show MeSH
The Gap curves for four of the examined transcription factors (TFs). Shown is the distance between the observed R2(goodness) of the predicted regulon and the 95th percentile of the -distribution (which is not shown here). A positive R2 means that the regulon is significant on the 5% significance level and the maximum of the Gap curve indicate the best number of genes to include in the regulon. The Gap curves for CBF, NAC072 and AREB are plotted along with the the curves obtained for two-hundred shuffled TFs (thin lines). The shuffled TFs get mostly negative Gap statistics as they lie close the expectation value of the -distribution. CBF, NAC072 and AREB show very significant Gap curves, the HY5 regulon on the other hand does not.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2198923&req=5

Figure 4: The Gap curves for four of the examined transcription factors (TFs). Shown is the distance between the observed R2(goodness) of the predicted regulon and the 95th percentile of the -distribution (which is not shown here). A positive R2 means that the regulon is significant on the 5% significance level and the maximum of the Gap curve indicate the best number of genes to include in the regulon. The Gap curves for CBF, NAC072 and AREB are plotted along with the the curves obtained for two-hundred shuffled TFs (thin lines). The shuffled TFs get mostly negative Gap statistics as they lie close the expectation value of the -distribution. CBF, NAC072 and AREB show very significant Gap curves, the HY5 regulon on the other hand does not.

Mentions: Figure 4 shows an example plot of the statistical quality of the predicted regulons for four of the examined TFs. The CBF, AREB and NAC072 regulons show convex Gap curves where the maximum indicates the best number of genes to include in the regulon. The observed Gap statistics are greater than zero which indicates that obtaining such a good or better regulon is highly unlikely given that the expression of the TF was independent of the rest of the genes. The HY5 regulon on the other hand, exhibits no stronger connection to its regulon than can be expected from a randomized TF, despite the fact that it contains a significant number of true targets. This could be an effect of the low resolution of the AtGenExpress dataset and the sinusoidal expression pattern of HY5 in response to UV-B stress. Such complex patterns depend on more parameters and are consequently harder to approximate with only seven time points.


Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana.

Redestig H, Weicht D, Selbig J, Hannah MA - BMC Bioinformatics (2007)

The Gap curves for four of the examined transcription factors (TFs). Shown is the distance between the observed R2(goodness) of the predicted regulon and the 95th percentile of the -distribution (which is not shown here). A positive R2 means that the regulon is significant on the 5% significance level and the maximum of the Gap curve indicate the best number of genes to include in the regulon. The Gap curves for CBF, NAC072 and AREB are plotted along with the the curves obtained for two-hundred shuffled TFs (thin lines). The shuffled TFs get mostly negative Gap statistics as they lie close the expectation value of the -distribution. CBF, NAC072 and AREB show very significant Gap curves, the HY5 regulon on the other hand does not.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2198923&req=5

Figure 4: The Gap curves for four of the examined transcription factors (TFs). Shown is the distance between the observed R2(goodness) of the predicted regulon and the 95th percentile of the -distribution (which is not shown here). A positive R2 means that the regulon is significant on the 5% significance level and the maximum of the Gap curve indicate the best number of genes to include in the regulon. The Gap curves for CBF, NAC072 and AREB are plotted along with the the curves obtained for two-hundred shuffled TFs (thin lines). The shuffled TFs get mostly negative Gap statistics as they lie close the expectation value of the -distribution. CBF, NAC072 and AREB show very significant Gap curves, the HY5 regulon on the other hand does not.
Mentions: Figure 4 shows an example plot of the statistical quality of the predicted regulons for four of the examined TFs. The CBF, AREB and NAC072 regulons show convex Gap curves where the maximum indicates the best number of genes to include in the regulon. The observed Gap statistics are greater than zero which indicates that obtaining such a good or better regulon is highly unlikely given that the expression of the TF was independent of the rest of the genes. The HY5 regulon on the other hand, exhibits no stronger connection to its regulon than can be expected from a randomized TF, despite the fact that it contains a significant number of true targets. This could be an effect of the low resolution of the AtGenExpress dataset and the sinusoidal expression pattern of HY5 in response to UV-B stress. Such complex patterns depend on more parameters and are consequently harder to approximate with only seven time points.

Bottom Line: Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data.We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Plant Physiology, Am M├╝hlenberg 1, D-14476 Potsdam-Golm, Germany. redestig@mpimp-golm.mpg.de

ABSTRACT

Background: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation.

Results: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.

Conclusion: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

Show MeSH