Limits...
Selection against tandem splice sites affecting structured protein regions.

Hiller M, Szafranski K, Huse K, Backofen R, Platzer M - BMC Evol. Biol. (2008)

Bottom Line: We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious.Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets.We estimate that ~2,400 introns are under selection against possessing a tandem site.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Albert-Ludwigs-University Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany. hiller@informatik.uni-freiburg.de

ABSTRACT

Background: Alternative selection of splice sites in tandem donors and acceptors is a major mode of alternative splicing. Here, we analyzed whether in-frame tandem sites leading to subtle mRNA insertions/deletions of 3, 6, or 9 nucleotides are under natural selection.

Results: We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious. The strength of selection is not homogeneous within the coding sequence as protein regions that fold into a fixed 3D structure (intrinsically ordered) are under stronger selection, especially against sites with a strong minor splice site. Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets. Using three-species comparisons, we estimate that more than half of all mutations that create NAGNAG acceptors in the coding region have been eliminated by selection.

Conclusion: We estimate that ~2,400 introns are under selection against possessing a tandem site.

Show MeSH

Related in: MedlinePlus

Avoidance of tandem acceptors in structured regions of Pfam domains. The distribution of exon/exon junctions derived from control introns, introns with tandem donors and acceptors (A) in alpha-helices, beta-sheets, and non-regular elements, (B) 'inside' or 'outside' structural elements (see text), (C) with respect to the average surface accessibility, and (D) with respect to the average inverse hydropathy scores. Kyte-Doolittle values were used to compute hydropathy scores for the ± 5 amino acid contexts. The values were inverted so that positive values indicate polar residues. To avoid potential biases, we excluded the insertion sequence of tandem donors and acceptors from the context. Different context lengths of ± 3, ± 10, or ± 15 residues give consistent results in D (Additional File 11). P-values using a χ2 test in A and B and a Wilcoxon rank sum test in C and D are indicated as *: P < 0.05, **: P < 0.001, ***: P < 0.0001.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2279118&req=5

Figure 2: Avoidance of tandem acceptors in structured regions of Pfam domains. The distribution of exon/exon junctions derived from control introns, introns with tandem donors and acceptors (A) in alpha-helices, beta-sheets, and non-regular elements, (B) 'inside' or 'outside' structural elements (see text), (C) with respect to the average surface accessibility, and (D) with respect to the average inverse hydropathy scores. Kyte-Doolittle values were used to compute hydropathy scores for the ± 5 amino acid contexts. The values were inverted so that positive values indicate polar residues. To avoid potential biases, we excluded the insertion sequence of tandem donors and acceptors from the context. Different context lengths of ± 3, ± 10, or ± 15 residues give consistent results in D (Additional File 11). P-values using a χ2 test in A and B and a Wilcoxon rank sum test in C and D are indicated as *: P < 0.05, **: P < 0.001, ***: P < 0.0001.

Mentions: To further test the underrepresentation of tandem splice sites in ordered regions, we focused on Pfam domains since these domains usually fold into a well-defined 3D structure. We obtained the protein secondary structure as well as the surface accessibility of residues from known 3D structures of Pfam domains. We mapped the position of 21 and 49 introns with tandem donors and acceptors, respectively, as previously described [28]. For comparison, we mapped the position of 4,015 introns without a Δ3/Δ6/Δ9 tandem donor or acceptor motif (called control introns) since small in-frame splice site variations cannot occur in these introns. Comparing the location of introns with respect to alpha-helices, beta-sheets, and non-regular elements, we found no difference between control introns and introns with tandem donors. However, introns with tandem acceptors are significantly biased against a location in helices and sheets (Figure 2A). This tendency is even more pronounced for NAGNAG acceptors (Additional File 7). As the exact boundaries of structural elements are sometimes difficult to determine, we further analyzed a broader context of ± 1 residue around the intron location. We considered an intron to be 'inside a structural element' if this broader context is completely inside a helix or inside a sheet. If the complete context is inside a non-regular element or in two different structural elements, the context is considered to be 'outside a structural element'. In this comparison, both tandem donors and acceptors show a noticeable avoidance of structural elements (Figure 2B). The average surface accessibility scores are indistinguishable between control intron and tandem donor regions, while regions with tandem acceptors have a significantly higher surface accessibility (Figure 2C). Finally, we found polar residues to be slightly enriched in tandem donor and strongly enriched in tandem acceptor protein contexts (Figure 2D), which is further evidence that protein variations caused by tandem acceptors are preferentially located on the surface of folded domains.


Selection against tandem splice sites affecting structured protein regions.

Hiller M, Szafranski K, Huse K, Backofen R, Platzer M - BMC Evol. Biol. (2008)

Avoidance of tandem acceptors in structured regions of Pfam domains. The distribution of exon/exon junctions derived from control introns, introns with tandem donors and acceptors (A) in alpha-helices, beta-sheets, and non-regular elements, (B) 'inside' or 'outside' structural elements (see text), (C) with respect to the average surface accessibility, and (D) with respect to the average inverse hydropathy scores. Kyte-Doolittle values were used to compute hydropathy scores for the ± 5 amino acid contexts. The values were inverted so that positive values indicate polar residues. To avoid potential biases, we excluded the insertion sequence of tandem donors and acceptors from the context. Different context lengths of ± 3, ± 10, or ± 15 residues give consistent results in D (Additional File 11). P-values using a χ2 test in A and B and a Wilcoxon rank sum test in C and D are indicated as *: P < 0.05, **: P < 0.001, ***: P < 0.0001.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2279118&req=5

Figure 2: Avoidance of tandem acceptors in structured regions of Pfam domains. The distribution of exon/exon junctions derived from control introns, introns with tandem donors and acceptors (A) in alpha-helices, beta-sheets, and non-regular elements, (B) 'inside' or 'outside' structural elements (see text), (C) with respect to the average surface accessibility, and (D) with respect to the average inverse hydropathy scores. Kyte-Doolittle values were used to compute hydropathy scores for the ± 5 amino acid contexts. The values were inverted so that positive values indicate polar residues. To avoid potential biases, we excluded the insertion sequence of tandem donors and acceptors from the context. Different context lengths of ± 3, ± 10, or ± 15 residues give consistent results in D (Additional File 11). P-values using a χ2 test in A and B and a Wilcoxon rank sum test in C and D are indicated as *: P < 0.05, **: P < 0.001, ***: P < 0.0001.
Mentions: To further test the underrepresentation of tandem splice sites in ordered regions, we focused on Pfam domains since these domains usually fold into a well-defined 3D structure. We obtained the protein secondary structure as well as the surface accessibility of residues from known 3D structures of Pfam domains. We mapped the position of 21 and 49 introns with tandem donors and acceptors, respectively, as previously described [28]. For comparison, we mapped the position of 4,015 introns without a Δ3/Δ6/Δ9 tandem donor or acceptor motif (called control introns) since small in-frame splice site variations cannot occur in these introns. Comparing the location of introns with respect to alpha-helices, beta-sheets, and non-regular elements, we found no difference between control introns and introns with tandem donors. However, introns with tandem acceptors are significantly biased against a location in helices and sheets (Figure 2A). This tendency is even more pronounced for NAGNAG acceptors (Additional File 7). As the exact boundaries of structural elements are sometimes difficult to determine, we further analyzed a broader context of ± 1 residue around the intron location. We considered an intron to be 'inside a structural element' if this broader context is completely inside a helix or inside a sheet. If the complete context is inside a non-regular element or in two different structural elements, the context is considered to be 'outside a structural element'. In this comparison, both tandem donors and acceptors show a noticeable avoidance of structural elements (Figure 2B). The average surface accessibility scores are indistinguishable between control intron and tandem donor regions, while regions with tandem acceptors have a significantly higher surface accessibility (Figure 2C). Finally, we found polar residues to be slightly enriched in tandem donor and strongly enriched in tandem acceptor protein contexts (Figure 2D), which is further evidence that protein variations caused by tandem acceptors are preferentially located on the surface of folded domains.

Bottom Line: We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious.Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets.We estimate that ~2,400 introns are under selection against possessing a tandem site.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Albert-Ludwigs-University Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany. hiller@informatik.uni-freiburg.de

ABSTRACT

Background: Alternative selection of splice sites in tandem donors and acceptors is a major mode of alternative splicing. Here, we analyzed whether in-frame tandem sites leading to subtle mRNA insertions/deletions of 3, 6, or 9 nucleotides are under natural selection.

Results: We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious. The strength of selection is not homogeneous within the coding sequence as protein regions that fold into a fixed 3D structure (intrinsically ordered) are under stronger selection, especially against sites with a strong minor splice site. Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets. Using three-species comparisons, we estimate that more than half of all mutations that create NAGNAG acceptors in the coding region have been eliminated by selection.

Conclusion: We estimate that ~2,400 introns are under selection against possessing a tandem site.

Show MeSH
Related in: MedlinePlus