Limits...
Unravelling the hidden DNA structural/physical code provides novel insights on promoter location.

Durán E, Djebali S, González S, Flores O, Mercader JM, Guigó R, Torrents D, Soler-López M, Orozco M - Nucleic Acids Res. (2013)

Bottom Line: Interestingly, the vast majority proved to be transcriptionally active despite the lack of specific sequence motifs, indicating that physical signaling is indeed able to detect promoter activity beyond conventional TSS prediction methods.Furthermore, highly active regions displayed typical chromatin features associated to promoters of housekeeping genes.Our results enable to redefine the promoter signatures and analyze the diversity, evolutionary conservation and dynamic regulation of human core promoters at large-scale.

View Article: PubMed Central - PubMed

Affiliation: Institute for Research in Biomedicine (IRB Barcelona), Barcelona 08028, Spain, Joint IRB-BSC Research Program on Computational Biology, Barcelona 08028, Spain, Bioinformatics and Genomics Group, Center for Genomic Regulation and Universitat Pompeu Fabra, Barcelona 08003, Spain, Barcelona Supercomputing Center, Barcelona 08034, Spain and Department of Biochemistry and Molecular Biology, University of Barcelona, Barcelona 08028, Spain.

ABSTRACT
Although protein recognition of DNA motifs in promoter regions has been traditionally considered as a critical regulatory element in transcription, the location of promoters, and in particular transcription start sites (TSSs), still remains a challenge. Here we perform a comprehensive analysis of putative core promoter sequences relative to non-annotated predicted TSSs along the human genome, which were defined by distinct DNA physical properties implemented in our ProStar computational algorithm. A representative sampling of predicted regions was subjected to extensive experimental validation and analyses. Interestingly, the vast majority proved to be transcriptionally active despite the lack of specific sequence motifs, indicating that physical signaling is indeed able to detect promoter activity beyond conventional TSS prediction methods. Furthermore, highly active regions displayed typical chromatin features associated to promoters of housekeeping genes. Our results enable to redefine the promoter signatures and analyze the diversity, evolutionary conservation and dynamic regulation of human core promoters at large-scale. Moreover, the present study strongly supports the hypothesis of an ancient regulatory mechanism encoded by the intrinsic physical properties of the DNA that may contribute to the complexity of transcription regulation in the human genome.

Show MeSH

Related in: MedlinePlus

Evaluation of PS+L− sequences on centering the TSS 500 bp upstream from the prediction. (a) Subset 2–shifted regions were reconstructed by first re-locating the TSS 500 bp upstream from the relative prediction in the human genome, and subsequently selecting the flanking ±1000 bp upstream and downstream regions, respectively (b) Distribution of CAGE tags in H1-hESC cells for the 2000 bp regions centered in relocated TSSs (c) RNA-seq analysis profiles of the same regions. X-axes show % distance bins, each one including 20 bp. Y-axes display the number of detected tags. Here we observe a major peak from both analyses around the 50th bin (1000 bp), indicating that it may correspond to a transcription start region (d) We confirmed the transcription ability of those regions by additional luciferase assays in four representative PS+L− sequences (three in sense strand and one in anti-sense), showing a significant higher activity (green bars) as compared with the original predictions (red).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753636&req=5

gkt511-F4: Evaluation of PS+L− sequences on centering the TSS 500 bp upstream from the prediction. (a) Subset 2–shifted regions were reconstructed by first re-locating the TSS 500 bp upstream from the relative prediction in the human genome, and subsequently selecting the flanking ±1000 bp upstream and downstream regions, respectively (b) Distribution of CAGE tags in H1-hESC cells for the 2000 bp regions centered in relocated TSSs (c) RNA-seq analysis profiles of the same regions. X-axes show % distance bins, each one including 20 bp. Y-axes display the number of detected tags. Here we observe a major peak from both analyses around the 50th bin (1000 bp), indicating that it may correspond to a transcription start region (d) We confirmed the transcription ability of those regions by additional luciferase assays in four representative PS+L− sequences (three in sense strand and one in anti-sense), showing a significant higher activity (green bars) as compared with the original predictions (red).

Mentions: We further interrogated this potential TSS displacement in the prediction by analyzing new genomic fragments but now centered on the observed CAGE peaks. To this end, we picked up regions from subset 2 and placed the TSS 500 bp upstream to the original ProStar TSS prediction, as indicated by the CAGE/RNA-seq profiles (Figure 4a). As expected, CAGE profiles exhibited a major peak around 800–900 bp, resembling subset 1 sequences (Figure 4b, around 45th bin). Similarly, RNA-seq profiles also presented a single peak at the expected position (Figure 4c, around 50th bin, 1000 bp). We then re-amplified four of these genomic regions by PCR, spanning 2000 bp but centered at the newly located TSS, as similarly done with previous subsets (Figure 4a; see Supplementary Figure S1 for method details). Interestingly, luciferase assays measured a 4-fold higher activity on average than the original sequences (Figure 4d), providing further evidence that subset 2 segments (PS+L−) do contain true TSSs.Figure 4.


Unravelling the hidden DNA structural/physical code provides novel insights on promoter location.

Durán E, Djebali S, González S, Flores O, Mercader JM, Guigó R, Torrents D, Soler-López M, Orozco M - Nucleic Acids Res. (2013)

Evaluation of PS+L− sequences on centering the TSS 500 bp upstream from the prediction. (a) Subset 2–shifted regions were reconstructed by first re-locating the TSS 500 bp upstream from the relative prediction in the human genome, and subsequently selecting the flanking ±1000 bp upstream and downstream regions, respectively (b) Distribution of CAGE tags in H1-hESC cells for the 2000 bp regions centered in relocated TSSs (c) RNA-seq analysis profiles of the same regions. X-axes show % distance bins, each one including 20 bp. Y-axes display the number of detected tags. Here we observe a major peak from both analyses around the 50th bin (1000 bp), indicating that it may correspond to a transcription start region (d) We confirmed the transcription ability of those regions by additional luciferase assays in four representative PS+L− sequences (three in sense strand and one in anti-sense), showing a significant higher activity (green bars) as compared with the original predictions (red).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753636&req=5

gkt511-F4: Evaluation of PS+L− sequences on centering the TSS 500 bp upstream from the prediction. (a) Subset 2–shifted regions were reconstructed by first re-locating the TSS 500 bp upstream from the relative prediction in the human genome, and subsequently selecting the flanking ±1000 bp upstream and downstream regions, respectively (b) Distribution of CAGE tags in H1-hESC cells for the 2000 bp regions centered in relocated TSSs (c) RNA-seq analysis profiles of the same regions. X-axes show % distance bins, each one including 20 bp. Y-axes display the number of detected tags. Here we observe a major peak from both analyses around the 50th bin (1000 bp), indicating that it may correspond to a transcription start region (d) We confirmed the transcription ability of those regions by additional luciferase assays in four representative PS+L− sequences (three in sense strand and one in anti-sense), showing a significant higher activity (green bars) as compared with the original predictions (red).
Mentions: We further interrogated this potential TSS displacement in the prediction by analyzing new genomic fragments but now centered on the observed CAGE peaks. To this end, we picked up regions from subset 2 and placed the TSS 500 bp upstream to the original ProStar TSS prediction, as indicated by the CAGE/RNA-seq profiles (Figure 4a). As expected, CAGE profiles exhibited a major peak around 800–900 bp, resembling subset 1 sequences (Figure 4b, around 45th bin). Similarly, RNA-seq profiles also presented a single peak at the expected position (Figure 4c, around 50th bin, 1000 bp). We then re-amplified four of these genomic regions by PCR, spanning 2000 bp but centered at the newly located TSS, as similarly done with previous subsets (Figure 4a; see Supplementary Figure S1 for method details). Interestingly, luciferase assays measured a 4-fold higher activity on average than the original sequences (Figure 4d), providing further evidence that subset 2 segments (PS+L−) do contain true TSSs.Figure 4.

Bottom Line: Interestingly, the vast majority proved to be transcriptionally active despite the lack of specific sequence motifs, indicating that physical signaling is indeed able to detect promoter activity beyond conventional TSS prediction methods.Furthermore, highly active regions displayed typical chromatin features associated to promoters of housekeeping genes.Our results enable to redefine the promoter signatures and analyze the diversity, evolutionary conservation and dynamic regulation of human core promoters at large-scale.

View Article: PubMed Central - PubMed

Affiliation: Institute for Research in Biomedicine (IRB Barcelona), Barcelona 08028, Spain, Joint IRB-BSC Research Program on Computational Biology, Barcelona 08028, Spain, Bioinformatics and Genomics Group, Center for Genomic Regulation and Universitat Pompeu Fabra, Barcelona 08003, Spain, Barcelona Supercomputing Center, Barcelona 08034, Spain and Department of Biochemistry and Molecular Biology, University of Barcelona, Barcelona 08028, Spain.

ABSTRACT
Although protein recognition of DNA motifs in promoter regions has been traditionally considered as a critical regulatory element in transcription, the location of promoters, and in particular transcription start sites (TSSs), still remains a challenge. Here we perform a comprehensive analysis of putative core promoter sequences relative to non-annotated predicted TSSs along the human genome, which were defined by distinct DNA physical properties implemented in our ProStar computational algorithm. A representative sampling of predicted regions was subjected to extensive experimental validation and analyses. Interestingly, the vast majority proved to be transcriptionally active despite the lack of specific sequence motifs, indicating that physical signaling is indeed able to detect promoter activity beyond conventional TSS prediction methods. Furthermore, highly active regions displayed typical chromatin features associated to promoters of housekeeping genes. Our results enable to redefine the promoter signatures and analyze the diversity, evolutionary conservation and dynamic regulation of human core promoters at large-scale. Moreover, the present study strongly supports the hypothesis of an ancient regulatory mechanism encoded by the intrinsic physical properties of the DNA that may contribute to the complexity of transcription regulation in the human genome.

Show MeSH
Related in: MedlinePlus