Limits...
Phylogenetic analysis of mRNA polyadenylation sites reveals a role of transposable elements in evolution of the 3'-end of genes.

Lee JY, Ji Z, Tian B - Nucleic Acids Res. (2008)

Bottom Line: We found that the 3'-most poly(A) sites tend to be more conserved than upstream ones, whereas poly(A) sites located upstream of the 3'-most exon, also termed intronic poly(A) sites, tend to be much less conserved.We also found that nonconserved poly(A) sites are associated with transposable elements (TEs) to a much greater extent than conserved ones, albeit less frequently utilized.Our results establish a conservation pattern for alternative poly(A) sites in several vertebrate species, and indicate that the 3'-end of genes can be dynamically modified by TEs through evolution.

View Article: PubMed Central - PubMed

Affiliation: Graduate School of Biomedical Sciences and Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, USA.

ABSTRACT
mRNA polyadenylation is an essential step for the maturation of almost all eukaryotic mRNAs, and is tightly coupled with termination of transcription in defining the 3'-end of genes. Large numbers of human and mouse genes harbor alternative polyadenylation sites [poly(A) sites] that lead to mRNA variants containing different 3'-untranslated regions (UTRs) and/or encoding distinct protein sequences. Here, we examined the conservation and divergence of different types of alternative poly(A) sites across human, mouse, rat and chicken. We found that the 3'-most poly(A) sites tend to be more conserved than upstream ones, whereas poly(A) sites located upstream of the 3'-most exon, also termed intronic poly(A) sites, tend to be much less conserved. Genes with longer evolutionary history are more likely to have alternative polyadenylation, suggesting gain of poly(A) sites through evolution. We also found that nonconserved poly(A) sites are associated with transposable elements (TEs) to a much greater extent than conserved ones, albeit less frequently utilized. Different classes of TEs have different characteristics in their association with poly(A) sites via exaptation of TE sequences into polyadenylation elements. Our results establish a conservation pattern for alternative poly(A) sites in several vertebrate species, and indicate that the 3'-end of genes can be dynamically modified by TEs through evolution.

Show MeSH
Poly(A) sites and L1. (A) Number of poly(A) sites associated with plus and minus strands of three L1 regions, i.e. 5′-end, ORF2 and 3′-end. (B) Distribution of poly(A) sites in ORF2 of L1M5 subfamily. The poly(A) sites are indicated by vertical bars and also shown in a profile, which is essentially a smoothed histogram of poly(A) site occurrence. The profile is smoothed by a 11 nt window, i.e. value of a position is the average of 11nt surrounding the position. Three association types (illustrated in Figure 3C) are represented by different colors, as indicated in the graph. The poly(A) site position for type 1 is actual poly(A) site location, whereas the position for types 2 or 3 is location of the closest nucleotide in TE to its associated poly(A) site. Additional 40 nt are added to both 5′- and 3′-ends to illustrate poly(A) sites located upstream or downstream of TE. Vertical dotted lines are the start and end of TE. (C) Distribution of poly(A) sites in 3′-end of L1ME4a subfamily.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553571&req=5

Figure 4: Poly(A) sites and L1. (A) Number of poly(A) sites associated with plus and minus strands of three L1 regions, i.e. 5′-end, ORF2 and 3′-end. (B) Distribution of poly(A) sites in ORF2 of L1M5 subfamily. The poly(A) sites are indicated by vertical bars and also shown in a profile, which is essentially a smoothed histogram of poly(A) site occurrence. The profile is smoothed by a 11 nt window, i.e. value of a position is the average of 11nt surrounding the position. Three association types (illustrated in Figure 3C) are represented by different colors, as indicated in the graph. The poly(A) site position for type 1 is actual poly(A) site location, whereas the position for types 2 or 3 is location of the closest nucleotide in TE to its associated poly(A) site. Additional 40 nt are added to both 5′- and 3′-ends to illustrate poly(A) sites located upstream or downstream of TE. Vertical dotted lines are the start and end of TE. (C) Distribution of poly(A) sites in 3′-end of L1ME4a subfamily.

Mentions: The L1 family of LINE accounts for ∼17% of the human genome, the highest among all TE families and has been active for the last ∼170 million years (MYR) (50). Not surprisingly, L1 is associated with poly(A) sites with the highest frequency among all TE families. Many internal poly(A) sites of L1 have been reported, which has been implicated in the modulation of its retrotransposition activity (42). A full-length L1 is composed of 5′-UTR, ORF1, ORF2 and 3′-UTR. However, L1 sequences in the human genome are often truncated at the 5′-end due to inefficient reverse transcription during retrotransposition (51). Consistently, the number of poly(A) sites associated with these sequences follows the order: 3′-end (3′-UTR) > ORF2 > 5′-end (5′-UTR + ORF1) (Figure 4A). As shown for the examples of top L1 subfamilies, ORF2 of L1M5 and 3′-end region of L1ME4a, poly(A) sites in ORF2 and 3′-end region are diffusely distributed (Figure 4B and C), except for several ‘hot spots’ on the minus strand of the 3′-end region. Interestingly, while ORF2 and 3′-end region contain much more AATAA/ATTAAA and other PAS hexamers on the plus strand than the minus strand (Supplementary Figure 3E and F), presumably due to their A-rich content, more poly(A) sites are associated with minus strands than plus strands, with a ratio of 2 : 1 (Figure 4A). This bias is in good agreement with previous reports that indicated preferential placement of L1 sequences in antisense orientation of host genes with a ratio of ∼2 (52). We further analyzed ORF2 and 3′-end sequences by PolyA_SVM, which uses 15 cis-elements surrounding poly(A) site for prediction (48). We found that more poly(A) sites can actually be predicted on the minus strand than on the plus strand (7 versus 3) for ORF2, and same number of sites for the 3′-end region (Supplementary Figure 3E and F). Thus, other cis-elements may exist on the minus strand that lead to higher occurrence of poly(A) sites than the plus strand, despite fewer PAS hexamers. Further experimental analysis is needed to confirm this hypothesis. In addition, several regions of L1 do not contain PAS or predicted poly(A) sites, but are associated with poly(A) sites with high frequency, suggesting that they may contain favorable sequences that can give rise to cis-elements for polyadenylation through mutations.Figure 4.


Phylogenetic analysis of mRNA polyadenylation sites reveals a role of transposable elements in evolution of the 3'-end of genes.

Lee JY, Ji Z, Tian B - Nucleic Acids Res. (2008)

Poly(A) sites and L1. (A) Number of poly(A) sites associated with plus and minus strands of three L1 regions, i.e. 5′-end, ORF2 and 3′-end. (B) Distribution of poly(A) sites in ORF2 of L1M5 subfamily. The poly(A) sites are indicated by vertical bars and also shown in a profile, which is essentially a smoothed histogram of poly(A) site occurrence. The profile is smoothed by a 11 nt window, i.e. value of a position is the average of 11nt surrounding the position. Three association types (illustrated in Figure 3C) are represented by different colors, as indicated in the graph. The poly(A) site position for type 1 is actual poly(A) site location, whereas the position for types 2 or 3 is location of the closest nucleotide in TE to its associated poly(A) site. Additional 40 nt are added to both 5′- and 3′-ends to illustrate poly(A) sites located upstream or downstream of TE. Vertical dotted lines are the start and end of TE. (C) Distribution of poly(A) sites in 3′-end of L1ME4a subfamily.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553571&req=5

Figure 4: Poly(A) sites and L1. (A) Number of poly(A) sites associated with plus and minus strands of three L1 regions, i.e. 5′-end, ORF2 and 3′-end. (B) Distribution of poly(A) sites in ORF2 of L1M5 subfamily. The poly(A) sites are indicated by vertical bars and also shown in a profile, which is essentially a smoothed histogram of poly(A) site occurrence. The profile is smoothed by a 11 nt window, i.e. value of a position is the average of 11nt surrounding the position. Three association types (illustrated in Figure 3C) are represented by different colors, as indicated in the graph. The poly(A) site position for type 1 is actual poly(A) site location, whereas the position for types 2 or 3 is location of the closest nucleotide in TE to its associated poly(A) site. Additional 40 nt are added to both 5′- and 3′-ends to illustrate poly(A) sites located upstream or downstream of TE. Vertical dotted lines are the start and end of TE. (C) Distribution of poly(A) sites in 3′-end of L1ME4a subfamily.
Mentions: The L1 family of LINE accounts for ∼17% of the human genome, the highest among all TE families and has been active for the last ∼170 million years (MYR) (50). Not surprisingly, L1 is associated with poly(A) sites with the highest frequency among all TE families. Many internal poly(A) sites of L1 have been reported, which has been implicated in the modulation of its retrotransposition activity (42). A full-length L1 is composed of 5′-UTR, ORF1, ORF2 and 3′-UTR. However, L1 sequences in the human genome are often truncated at the 5′-end due to inefficient reverse transcription during retrotransposition (51). Consistently, the number of poly(A) sites associated with these sequences follows the order: 3′-end (3′-UTR) > ORF2 > 5′-end (5′-UTR + ORF1) (Figure 4A). As shown for the examples of top L1 subfamilies, ORF2 of L1M5 and 3′-end region of L1ME4a, poly(A) sites in ORF2 and 3′-end region are diffusely distributed (Figure 4B and C), except for several ‘hot spots’ on the minus strand of the 3′-end region. Interestingly, while ORF2 and 3′-end region contain much more AATAA/ATTAAA and other PAS hexamers on the plus strand than the minus strand (Supplementary Figure 3E and F), presumably due to their A-rich content, more poly(A) sites are associated with minus strands than plus strands, with a ratio of 2 : 1 (Figure 4A). This bias is in good agreement with previous reports that indicated preferential placement of L1 sequences in antisense orientation of host genes with a ratio of ∼2 (52). We further analyzed ORF2 and 3′-end sequences by PolyA_SVM, which uses 15 cis-elements surrounding poly(A) site for prediction (48). We found that more poly(A) sites can actually be predicted on the minus strand than on the plus strand (7 versus 3) for ORF2, and same number of sites for the 3′-end region (Supplementary Figure 3E and F). Thus, other cis-elements may exist on the minus strand that lead to higher occurrence of poly(A) sites than the plus strand, despite fewer PAS hexamers. Further experimental analysis is needed to confirm this hypothesis. In addition, several regions of L1 do not contain PAS or predicted poly(A) sites, but are associated with poly(A) sites with high frequency, suggesting that they may contain favorable sequences that can give rise to cis-elements for polyadenylation through mutations.Figure 4.

Bottom Line: We found that the 3'-most poly(A) sites tend to be more conserved than upstream ones, whereas poly(A) sites located upstream of the 3'-most exon, also termed intronic poly(A) sites, tend to be much less conserved.We also found that nonconserved poly(A) sites are associated with transposable elements (TEs) to a much greater extent than conserved ones, albeit less frequently utilized.Our results establish a conservation pattern for alternative poly(A) sites in several vertebrate species, and indicate that the 3'-end of genes can be dynamically modified by TEs through evolution.

View Article: PubMed Central - PubMed

Affiliation: Graduate School of Biomedical Sciences and Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, USA.

ABSTRACT
mRNA polyadenylation is an essential step for the maturation of almost all eukaryotic mRNAs, and is tightly coupled with termination of transcription in defining the 3'-end of genes. Large numbers of human and mouse genes harbor alternative polyadenylation sites [poly(A) sites] that lead to mRNA variants containing different 3'-untranslated regions (UTRs) and/or encoding distinct protein sequences. Here, we examined the conservation and divergence of different types of alternative poly(A) sites across human, mouse, rat and chicken. We found that the 3'-most poly(A) sites tend to be more conserved than upstream ones, whereas poly(A) sites located upstream of the 3'-most exon, also termed intronic poly(A) sites, tend to be much less conserved. Genes with longer evolutionary history are more likely to have alternative polyadenylation, suggesting gain of poly(A) sites through evolution. We also found that nonconserved poly(A) sites are associated with transposable elements (TEs) to a much greater extent than conserved ones, albeit less frequently utilized. Different classes of TEs have different characteristics in their association with poly(A) sites via exaptation of TE sequences into polyadenylation elements. Our results establish a conservation pattern for alternative poly(A) sites in several vertebrate species, and indicate that the 3'-end of genes can be dynamically modified by TEs through evolution.

Show MeSH