Limits...
Predicting promoter activities of primary human DNA sequences.

Irie T, Park SJ, Yamashita R, Seki M, Yada T, Sugano S, Nakai K, Suzuki Y - Nucleic Acids Res. (2011)

Bottom Line: We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters.We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model.Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8562, Japan.

ABSTRACT
We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.

Show MeSH

Related in: MedlinePlus

Predicted potential promoter activity landscape of the human genome. Example of an intergenic region with the indicated TSS tag count (red bars), pol II binding signal (green bars) and prediction score (blue bars). The description of this graph is as in Figure 4B and C.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113590&req=5

Figure 6: Predicted potential promoter activity landscape of the human genome. Example of an intergenic region with the indicated TSS tag count (red bars), pol II binding signal (green bars) and prediction score (blue bars). The description of this graph is as in Figure 4B and C.

Mentions: We applied the prediction model to the entire human genome to illustrate the landscape of potential promoter activities in the human genomic sequence. We tentatively defined a prediction score of >1 as the threshold, as used for the RefSeq genes shown above. In total, 185 018 genomic regions outside the RefSeq regions showed prediction scores >1. We examined the overlap between intergenic regions with prediction scores >1 and intergenic regions with >5 ppm TSS-Seq tags. We found 147 overlapping regions. As exemplified in Figure 6, previously identified lncRNA cDNAs were sometimes located in those regions. In these 147 cases, we found that the surrounding genomic regions had an open chromatin structure. Clear binding signals for pol II were observed in 97 cases (66%). These results suggest that biologically controlled transcription is actually occurring from these regions.Figure 6.


Predicting promoter activities of primary human DNA sequences.

Irie T, Park SJ, Yamashita R, Seki M, Yada T, Sugano S, Nakai K, Suzuki Y - Nucleic Acids Res. (2011)

Predicted potential promoter activity landscape of the human genome. Example of an intergenic region with the indicated TSS tag count (red bars), pol II binding signal (green bars) and prediction score (blue bars). The description of this graph is as in Figure 4B and C.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113590&req=5

Figure 6: Predicted potential promoter activity landscape of the human genome. Example of an intergenic region with the indicated TSS tag count (red bars), pol II binding signal (green bars) and prediction score (blue bars). The description of this graph is as in Figure 4B and C.
Mentions: We applied the prediction model to the entire human genome to illustrate the landscape of potential promoter activities in the human genomic sequence. We tentatively defined a prediction score of >1 as the threshold, as used for the RefSeq genes shown above. In total, 185 018 genomic regions outside the RefSeq regions showed prediction scores >1. We examined the overlap between intergenic regions with prediction scores >1 and intergenic regions with >5 ppm TSS-Seq tags. We found 147 overlapping regions. As exemplified in Figure 6, previously identified lncRNA cDNAs were sometimes located in those regions. In these 147 cases, we found that the surrounding genomic regions had an open chromatin structure. Clear binding signals for pol II were observed in 97 cases (66%). These results suggest that biologically controlled transcription is actually occurring from these regions.Figure 6.

Bottom Line: We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters.We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model.Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8562, Japan.

ABSTRACT
We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.

Show MeSH
Related in: MedlinePlus