Limits...
Sequence determinants in human polyadenylation site selection.

Legendre M, Gautheret D - BMC Genomics (2003)

Bottom Line: While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites.However, the downstream U-rich sequences may also play an enhancing role.Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: INSERM ERM-206, Luminy Case 906, 13288 Marseille Cedex 09, France. legendre@tagc.univ-mrs.fr

ABSTRACT

Background: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals.

Results: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%.

Conclusion: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.

Show MeSH
Uracil frequencies in a 11 nt window in the vicinity of alternative poly(A) sites, distinguishing proximal sites from distal sites. (a) : "strong" poly(A) sites (129 proximal, 499 distal); (b) : "weak" poly(A) sites (655 proximal, 210 distal).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC151664&req=5

Figure 5: Uracil frequencies in a 11 nt window in the vicinity of alternative poly(A) sites, distinguishing proximal sites from distal sites. (a) : "strong" poly(A) sites (129 proximal, 499 distal); (b) : "weak" poly(A) sites (655 proximal, 210 distal).

Mentions: We showed previously that, in UTRs with multiple sites, the strongest poly(A) sites were often the most distal ones [1]. We thus questioned whether the apparent "strong site" characteristics in Figure 3 could be associated instead to distal sites, independently of their strength. Figure 5 shows U% variations in distal vs. proximal polyadenylation sites, for strong (a) and weak sites (b). Due to the small number of sites considered in some cases (especially proximal/strong and distal/weak), the corresponding average curves are somewhat jaggy but, in any case, strong proximal sites do not differ significantly from strong distal sites in the DSE region (Figure 5.a: T-test P value = 0.09). However, a higher %U peak in the DSE region is definitely characteristic to strong sites, independently of their position in the UTR. Although both strong and weak sites display a significant uracil rise in the DSE (T-test P values < 10-15 for either strong or weak sites vs. control sites, Fig. 5.a and 5.b), the difference between strong and weak sites is also highly significant (T-test P value = 8.0 10-10 for proximal strong sites vs. proximal weak sites). As for the upstream region, although %U level in the USE is consistently higher than background in strong poly(A) sites, it is also higher in weak sites, suggesting again that the 5' bias occurs in both classes of sites.


Sequence determinants in human polyadenylation site selection.

Legendre M, Gautheret D - BMC Genomics (2003)

Uracil frequencies in a 11 nt window in the vicinity of alternative poly(A) sites, distinguishing proximal sites from distal sites. (a) : "strong" poly(A) sites (129 proximal, 499 distal); (b) : "weak" poly(A) sites (655 proximal, 210 distal).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC151664&req=5

Figure 5: Uracil frequencies in a 11 nt window in the vicinity of alternative poly(A) sites, distinguishing proximal sites from distal sites. (a) : "strong" poly(A) sites (129 proximal, 499 distal); (b) : "weak" poly(A) sites (655 proximal, 210 distal).
Mentions: We showed previously that, in UTRs with multiple sites, the strongest poly(A) sites were often the most distal ones [1]. We thus questioned whether the apparent "strong site" characteristics in Figure 3 could be associated instead to distal sites, independently of their strength. Figure 5 shows U% variations in distal vs. proximal polyadenylation sites, for strong (a) and weak sites (b). Due to the small number of sites considered in some cases (especially proximal/strong and distal/weak), the corresponding average curves are somewhat jaggy but, in any case, strong proximal sites do not differ significantly from strong distal sites in the DSE region (Figure 5.a: T-test P value = 0.09). However, a higher %U peak in the DSE region is definitely characteristic to strong sites, independently of their position in the UTR. Although both strong and weak sites display a significant uracil rise in the DSE (T-test P values < 10-15 for either strong or weak sites vs. control sites, Fig. 5.a and 5.b), the difference between strong and weak sites is also highly significant (T-test P value = 8.0 10-10 for proximal strong sites vs. proximal weak sites). As for the upstream region, although %U level in the USE is consistently higher than background in strong poly(A) sites, it is also higher in weak sites, suggesting again that the 5' bias occurs in both classes of sites.

Bottom Line: While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites.However, the downstream U-rich sequences may also play an enhancing role.Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: INSERM ERM-206, Luminy Case 906, 13288 Marseille Cedex 09, France. legendre@tagc.univ-mrs.fr

ABSTRACT

Background: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals.

Results: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%.

Conclusion: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.

Show MeSH