Limits...
Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana.

Redestig H, Weicht D, Selbig J, Hannah MA - BMC Bioinformatics (2007)

Bottom Line: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions.We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam-Golm, Germany. redestig@mpimp-golm.mpg.de

ABSTRACT

Background: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation.

Results: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.

Conclusion: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

Show MeSH
Boxplot of method performance using simulated data. Performance was measured as the percentage of the recovered genes (100 × True positives/n) in the top 2n predicted genes where n equals the size of the planted regulon. The simplistic method CERMT-0 does not consider any time lag, makes no selection of treatments and is therefore robust against the size of the planted regulon but for the same reason also fails if there exists a time lag between the TF and its targets. CERMT on the other hand performs poorly if the planted regulon is small, but it is robust against the presence of time lags.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2198923&req=5

Figure 3: Boxplot of method performance using simulated data. Performance was measured as the percentage of the recovered genes (100 × True positives/n) in the top 2n predicted genes where n equals the size of the planted regulon. The simplistic method CERMT-0 does not consider any time lag, makes no selection of treatments and is therefore robust against the size of the planted regulon but for the same reason also fails if there exists a time lag between the TF and its targets. CERMT on the other hand performs poorly if the planted regulon is small, but it is robust against the presence of time lags.

Mentions: We compared CERMT with its reduced version, CERMT-0, on 100 simulated data sets that contained 10000 genes, six different treatments and seven time points. In three of the treatments, a regulon of varying size was added that followed the pattern of the TF directly or lagged by either 1 or 2 time points. Figure 3 shows boxplots on the percentage of the true positives that were found in the top 2n genes, where n equals the size of the planted regulon. The performance of CERMT-0 is high if there is no time lag, but, not surprisingly, very poor if we plant a delay. The full CERMT approach performs poorly if the sought regulon is too small as the 'right' treatment pair becomes increasingly diffcult to find with decreasing regulon size. On the other hand, for sufficiently large regulons, CERMT shows good performance regardless of whether response is delayed or not. The size of the smallest detectable regulon decreases with the amount of time points in the experiment (data not shown). Note that though the limit for the minimum regulon size in the simulation seems to be around 50 genes, this estimate is strongly data specific and is not transferable to performance on real data.


Transcription factor target prediction using multiple short expression time series from Arabidopsis thaliana.

Redestig H, Weicht D, Selbig J, Hannah MA - BMC Bioinformatics (2007)

Boxplot of method performance using simulated data. Performance was measured as the percentage of the recovered genes (100 × True positives/n) in the top 2n predicted genes where n equals the size of the planted regulon. The simplistic method CERMT-0 does not consider any time lag, makes no selection of treatments and is therefore robust against the size of the planted regulon but for the same reason also fails if there exists a time lag between the TF and its targets. CERMT on the other hand performs poorly if the planted regulon is small, but it is robust against the presence of time lags.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2198923&req=5

Figure 3: Boxplot of method performance using simulated data. Performance was measured as the percentage of the recovered genes (100 × True positives/n) in the top 2n predicted genes where n equals the size of the planted regulon. The simplistic method CERMT-0 does not consider any time lag, makes no selection of treatments and is therefore robust against the size of the planted regulon but for the same reason also fails if there exists a time lag between the TF and its targets. CERMT on the other hand performs poorly if the planted regulon is small, but it is robust against the presence of time lags.
Mentions: We compared CERMT with its reduced version, CERMT-0, on 100 simulated data sets that contained 10000 genes, six different treatments and seven time points. In three of the treatments, a regulon of varying size was added that followed the pattern of the TF directly or lagged by either 1 or 2 time points. Figure 3 shows boxplots on the percentage of the true positives that were found in the top 2n genes, where n equals the size of the planted regulon. The performance of CERMT-0 is high if there is no time lag, but, not surprisingly, very poor if we plant a delay. The full CERMT approach performs poorly if the sought regulon is too small as the 'right' treatment pair becomes increasingly diffcult to find with decreasing regulon size. On the other hand, for sufficiently large regulons, CERMT shows good performance regardless of whether response is delayed or not. The size of the smallest detectable regulon decreases with the amount of time points in the experiment (data not shown). Note that though the limit for the minimum regulon size in the simulation seems to be around 50 genes, this estimate is strongly data specific and is not transferable to performance on real data.

Bottom Line: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions.We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam-Golm, Germany. redestig@mpimp-golm.mpg.de

ABSTRACT

Background: The central role of transcription factors (TFs) in higher eukaryotes has led to much interest in deciphering transcriptional regulatory interactions. Even in the best case, experimental identification of TF target genes is error prone, and has been shown to be improved by considering additional forms of evidence such as expression data. Previous expression based methods have not explicitly tried to associate TFs with their targets and therefore largely ignored the treatment specific and time dependent nature of transcription regulation.

Results: In this study we introduce CERMT, Covariance based Extraction of Regulatory targets using Multiple Time series. Using simulated and real data we show that using multiple expression time series, selecting treatments in which the TF responds, allowing time shifts between TFs and their targets and using covariance to identify highly responding genes appear to be a good strategy. We applied our method to published TF - target gene relationships determined using expression profiling on TF mutants and show that in most cases we obtain significant target gene enrichment and in half of the cases this is sufficient to deliver a usable list of high-confidence target genes.

Conclusion: CERMT could be immediately useful in refining possible target genes of candidate TFs using publicly available data, particularly for organisms lacking comprehensive TF binding data. In the future, we believe its incorporation with other forms of evidence may improve integrative genome-wide predictions of transcriptional networks.

Show MeSH