Limits...
A functional human Poly(A) site requires only a potent DSE and an A-rich upstream sequence.

Nunes NM, Li W, Tian B, Furger A - EMBO J. (2010)

Bottom Line: Mutation of the AUUAAA hexamer had little effect on MC4R 3'end processing but small changes in the short DSE severely reduced cleavage efficiency.This is supported by a genome-wide analysis of over 10 000 poly(A) sites where we show that many human noncanonical poly(A) signals contain A-rich upstream sequences and tend to have a higher frequency of U and GU nucleotides in their DSE compared with canonical poly(A) signals.The importance of A-rich elements for noncanonical poly(A) site recognition was confirmed by mutational analysis of the human JUNB gene, which contains an A-rich noncanonical poly(A) signal.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, University of Oxford, Oxford, UK.

ABSTRACT
We have analysed the sequences required for cleavage and polyadenylation in the intronless melanocortin 4 receptor (MC4R) pre-mRNA. Unlike other intronless genes, 3'end processing of the MC4R primary transcript is independent of any auxiliary sequence elements and only requires the core poly(A) sequences. Mutation of the AUUAAA hexamer had little effect on MC4R 3'end processing but small changes in the short DSE severely reduced cleavage efficiency. The MC4R poly(A) site requires only the DSE and an A-rich upstream sequence to direct efficient cleavage and polyadenylation. Our observation may be highly relevant for the understanding of how human noncanonical poly(A) sites are recognised. This is supported by a genome-wide analysis of over 10 000 poly(A) sites where we show that many human noncanonical poly(A) signals contain A-rich upstream sequences and tend to have a higher frequency of U and GU nucleotides in their DSE compared with canonical poly(A) signals. The importance of A-rich elements for noncanonical poly(A) site recognition was confirmed by mutational analysis of the human JUNB gene, which contains an A-rich noncanonical poly(A) signal.

Show MeSH
Systematic analysis of poly(A) sites with A(A/U)UAAA and A-rich elements. A-rich elements are hexamers containing at least five As excluding AAUAAA and not overlapping with A(A/U)UAAA. (A) Difference in nucleotide frequency surrounding poly(A) sites with A-rich elements vs A(A/U)UAAA poly(A) sites. (B) Significance of bias of 4-mers in the +5 to +40 nt region of A-rich and A(A/U)UAAA poly(A) sites. A significance score was calculated for each 4-mer based on its bias of occurrence in A-rich or A(A/U)UAAA poly(A) sites using Fisher's exact test (see Materials and methods for detail). The significance score is −log(P-value) if the 4-mer is biased to A(A/U)UAAA poly(A) sites, or log(P-value) if biased to A-rich poly(A) sites. The distribution of significance scores is shown in a histogram. The top 10 4-mers significantly biased to A-rich poly(A) sites and to A(A/U)UAAA poly(A) sites are listed, together with their P-values. The most significant 4-mer, UUUU, is indicated in the histogram. (C) Schematics of single poly(A) sites (S); first (F), middle (M) and last (L) poly(A) sites in genes with alternative poly(A) sites located in the 3′-most exon. Poly(A) sites are indicated by arrows. CDS, coding sequence. (D) Percent of poly(A) sites with A(A/U)UAAA and/or A-rich elements in the −40 to −10 nt region for the 4 poly(A) site types shown in (C). (E) Percent of poly(A) sites with co-occurrence of A(A/U)UAAA or A-rich elements and U-rich (left) or GU-rich elements (right) for the four poly(A) site types. The U-rich or GU-rich sequence elements are described in ‘Materials and methods'. The error bars are standard deviations. The differences in occurrence of U-rich or GU-rich sequence elements were evaluated by Fisher's exact test. Significant ones are indicated by one asterisk (P<0.05) or two asterisks (P<0.01). (F) Percent of poly(A) sites conserved in mouse with co-occurrence of A-rich elements only or A(A/U)UAAA only and downstream GU-rich and/or U-rich elements for the four poly(A) site types. The error bars are standard deviations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2876958&req=5

f5: Systematic analysis of poly(A) sites with A(A/U)UAAA and A-rich elements. A-rich elements are hexamers containing at least five As excluding AAUAAA and not overlapping with A(A/U)UAAA. (A) Difference in nucleotide frequency surrounding poly(A) sites with A-rich elements vs A(A/U)UAAA poly(A) sites. (B) Significance of bias of 4-mers in the +5 to +40 nt region of A-rich and A(A/U)UAAA poly(A) sites. A significance score was calculated for each 4-mer based on its bias of occurrence in A-rich or A(A/U)UAAA poly(A) sites using Fisher's exact test (see Materials and methods for detail). The significance score is −log(P-value) if the 4-mer is biased to A(A/U)UAAA poly(A) sites, or log(P-value) if biased to A-rich poly(A) sites. The distribution of significance scores is shown in a histogram. The top 10 4-mers significantly biased to A-rich poly(A) sites and to A(A/U)UAAA poly(A) sites are listed, together with their P-values. The most significant 4-mer, UUUU, is indicated in the histogram. (C) Schematics of single poly(A) sites (S); first (F), middle (M) and last (L) poly(A) sites in genes with alternative poly(A) sites located in the 3′-most exon. Poly(A) sites are indicated by arrows. CDS, coding sequence. (D) Percent of poly(A) sites with A(A/U)UAAA and/or A-rich elements in the −40 to −10 nt region for the 4 poly(A) site types shown in (C). (E) Percent of poly(A) sites with co-occurrence of A(A/U)UAAA or A-rich elements and U-rich (left) or GU-rich elements (right) for the four poly(A) site types. The U-rich or GU-rich sequence elements are described in ‘Materials and methods'. The error bars are standard deviations. The differences in occurrence of U-rich or GU-rich sequence elements were evaluated by Fisher's exact test. Significant ones are indicated by one asterisk (P<0.05) or two asterisks (P<0.01). (F) Percent of poly(A) sites conserved in mouse with co-occurrence of A-rich elements only or A(A/U)UAAA only and downstream GU-rich and/or U-rich elements for the four poly(A) site types. The error bars are standard deviations.

Mentions: The above described data imply that in the context of a strong DSE, human poly(A) sites may be less dependent on the presence of an A(A/U)UAAA canonical hexamer for its function. Hence, strong DSEs may be critical for the recognition of many noncanonical poly(A) sites. If this assumption were true, a significant amount of noncanonical poly(A) sites could be expected to contain A-rich upstream sequences and they should generally have stronger DSEs (defined as increased U or GU richness) compared with canonical poly(A) sites. To test this hypothesis, we conducted a genome-wide bioinformatics analysis using over 10 000 human poly(A) sites obtained from the PolyA_DB database (Lee et al, 2007). As A-rich sequences in a transcript can lead to internal priming for reverse transcription, resulting in false identification of poly(A) sites (Lee et al, 2008b), we required that supporting cDNA/EST/Trace sequences for a poly(A) site contained at least 30 nt As/Ts corresponding to the poly(A) tail. As can be seen in Figure 5A, DSEs in poly(A) sites that constitute an A-rich upstream sequence (defined as a hexamer with ⩾5 adenosines but excluding AAUAAA and not overlapping with A(A/U)UAAA) have a significantly higher frequency of uridines in the +1 to +40 region compared with A(A/U)UAAA poly(A) sites. A more detailed analysis comparing the frequency of 4-mers in the DSEs shows a very strong bias (P-value of 1.2E−17) of UUUU and a significant bias of UGUU, a sequence element present in the MC4R DSE, towards A-rich sequences (Figure 5B). We have found no correlation between the appearance of A-rich noncanonical poly(A) sites and intronless genes (data not shown).


A functional human Poly(A) site requires only a potent DSE and an A-rich upstream sequence.

Nunes NM, Li W, Tian B, Furger A - EMBO J. (2010)

Systematic analysis of poly(A) sites with A(A/U)UAAA and A-rich elements. A-rich elements are hexamers containing at least five As excluding AAUAAA and not overlapping with A(A/U)UAAA. (A) Difference in nucleotide frequency surrounding poly(A) sites with A-rich elements vs A(A/U)UAAA poly(A) sites. (B) Significance of bias of 4-mers in the +5 to +40 nt region of A-rich and A(A/U)UAAA poly(A) sites. A significance score was calculated for each 4-mer based on its bias of occurrence in A-rich or A(A/U)UAAA poly(A) sites using Fisher's exact test (see Materials and methods for detail). The significance score is −log(P-value) if the 4-mer is biased to A(A/U)UAAA poly(A) sites, or log(P-value) if biased to A-rich poly(A) sites. The distribution of significance scores is shown in a histogram. The top 10 4-mers significantly biased to A-rich poly(A) sites and to A(A/U)UAAA poly(A) sites are listed, together with their P-values. The most significant 4-mer, UUUU, is indicated in the histogram. (C) Schematics of single poly(A) sites (S); first (F), middle (M) and last (L) poly(A) sites in genes with alternative poly(A) sites located in the 3′-most exon. Poly(A) sites are indicated by arrows. CDS, coding sequence. (D) Percent of poly(A) sites with A(A/U)UAAA and/or A-rich elements in the −40 to −10 nt region for the 4 poly(A) site types shown in (C). (E) Percent of poly(A) sites with co-occurrence of A(A/U)UAAA or A-rich elements and U-rich (left) or GU-rich elements (right) for the four poly(A) site types. The U-rich or GU-rich sequence elements are described in ‘Materials and methods'. The error bars are standard deviations. The differences in occurrence of U-rich or GU-rich sequence elements were evaluated by Fisher's exact test. Significant ones are indicated by one asterisk (P<0.05) or two asterisks (P<0.01). (F) Percent of poly(A) sites conserved in mouse with co-occurrence of A-rich elements only or A(A/U)UAAA only and downstream GU-rich and/or U-rich elements for the four poly(A) site types. The error bars are standard deviations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2876958&req=5

f5: Systematic analysis of poly(A) sites with A(A/U)UAAA and A-rich elements. A-rich elements are hexamers containing at least five As excluding AAUAAA and not overlapping with A(A/U)UAAA. (A) Difference in nucleotide frequency surrounding poly(A) sites with A-rich elements vs A(A/U)UAAA poly(A) sites. (B) Significance of bias of 4-mers in the +5 to +40 nt region of A-rich and A(A/U)UAAA poly(A) sites. A significance score was calculated for each 4-mer based on its bias of occurrence in A-rich or A(A/U)UAAA poly(A) sites using Fisher's exact test (see Materials and methods for detail). The significance score is −log(P-value) if the 4-mer is biased to A(A/U)UAAA poly(A) sites, or log(P-value) if biased to A-rich poly(A) sites. The distribution of significance scores is shown in a histogram. The top 10 4-mers significantly biased to A-rich poly(A) sites and to A(A/U)UAAA poly(A) sites are listed, together with their P-values. The most significant 4-mer, UUUU, is indicated in the histogram. (C) Schematics of single poly(A) sites (S); first (F), middle (M) and last (L) poly(A) sites in genes with alternative poly(A) sites located in the 3′-most exon. Poly(A) sites are indicated by arrows. CDS, coding sequence. (D) Percent of poly(A) sites with A(A/U)UAAA and/or A-rich elements in the −40 to −10 nt region for the 4 poly(A) site types shown in (C). (E) Percent of poly(A) sites with co-occurrence of A(A/U)UAAA or A-rich elements and U-rich (left) or GU-rich elements (right) for the four poly(A) site types. The U-rich or GU-rich sequence elements are described in ‘Materials and methods'. The error bars are standard deviations. The differences in occurrence of U-rich or GU-rich sequence elements were evaluated by Fisher's exact test. Significant ones are indicated by one asterisk (P<0.05) or two asterisks (P<0.01). (F) Percent of poly(A) sites conserved in mouse with co-occurrence of A-rich elements only or A(A/U)UAAA only and downstream GU-rich and/or U-rich elements for the four poly(A) site types. The error bars are standard deviations.
Mentions: The above described data imply that in the context of a strong DSE, human poly(A) sites may be less dependent on the presence of an A(A/U)UAAA canonical hexamer for its function. Hence, strong DSEs may be critical for the recognition of many noncanonical poly(A) sites. If this assumption were true, a significant amount of noncanonical poly(A) sites could be expected to contain A-rich upstream sequences and they should generally have stronger DSEs (defined as increased U or GU richness) compared with canonical poly(A) sites. To test this hypothesis, we conducted a genome-wide bioinformatics analysis using over 10 000 human poly(A) sites obtained from the PolyA_DB database (Lee et al, 2007). As A-rich sequences in a transcript can lead to internal priming for reverse transcription, resulting in false identification of poly(A) sites (Lee et al, 2008b), we required that supporting cDNA/EST/Trace sequences for a poly(A) site contained at least 30 nt As/Ts corresponding to the poly(A) tail. As can be seen in Figure 5A, DSEs in poly(A) sites that constitute an A-rich upstream sequence (defined as a hexamer with ⩾5 adenosines but excluding AAUAAA and not overlapping with A(A/U)UAAA) have a significantly higher frequency of uridines in the +1 to +40 region compared with A(A/U)UAAA poly(A) sites. A more detailed analysis comparing the frequency of 4-mers in the DSEs shows a very strong bias (P-value of 1.2E−17) of UUUU and a significant bias of UGUU, a sequence element present in the MC4R DSE, towards A-rich sequences (Figure 5B). We have found no correlation between the appearance of A-rich noncanonical poly(A) sites and intronless genes (data not shown).

Bottom Line: Mutation of the AUUAAA hexamer had little effect on MC4R 3'end processing but small changes in the short DSE severely reduced cleavage efficiency.This is supported by a genome-wide analysis of over 10 000 poly(A) sites where we show that many human noncanonical poly(A) signals contain A-rich upstream sequences and tend to have a higher frequency of U and GU nucleotides in their DSE compared with canonical poly(A) signals.The importance of A-rich elements for noncanonical poly(A) site recognition was confirmed by mutational analysis of the human JUNB gene, which contains an A-rich noncanonical poly(A) signal.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, University of Oxford, Oxford, UK.

ABSTRACT
We have analysed the sequences required for cleavage and polyadenylation in the intronless melanocortin 4 receptor (MC4R) pre-mRNA. Unlike other intronless genes, 3'end processing of the MC4R primary transcript is independent of any auxiliary sequence elements and only requires the core poly(A) sequences. Mutation of the AUUAAA hexamer had little effect on MC4R 3'end processing but small changes in the short DSE severely reduced cleavage efficiency. The MC4R poly(A) site requires only the DSE and an A-rich upstream sequence to direct efficient cleavage and polyadenylation. Our observation may be highly relevant for the understanding of how human noncanonical poly(A) sites are recognised. This is supported by a genome-wide analysis of over 10 000 poly(A) sites where we show that many human noncanonical poly(A) signals contain A-rich upstream sequences and tend to have a higher frequency of U and GU nucleotides in their DSE compared with canonical poly(A) signals. The importance of A-rich elements for noncanonical poly(A) site recognition was confirmed by mutational analysis of the human JUNB gene, which contains an A-rich noncanonical poly(A) signal.

Show MeSH