Limits...
Genome-wide association between branch point properties and alternative splicing.

Corvelo A, Hallegger M, Smith CW, Eyras E - PLoS Comput. Biol. (2010)

Bottom Line: Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human.Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts.The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.

View Article: PubMed Central - PubMed

Affiliation: Computational Genomics, Universitat Pompeu Fabra, Barcelona, Spain.

ABSTRACT
The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3' end of introns, with distance to the 3' splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.

Show MeSH
Predicted human branch points.A – Histogram representing the distribution of BS positions relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). Grey region represents positions that are biased by the presence of the AG dinucleotide. The dashed red line represents the leftmost point where the distribution is different from an expected uniform distribution. The AG dinucleotide exact position is shown on the x-axis. For this plot, top scoring candidates over the last 500nt were considered in order to obtain the left background tail. For visualization purposes only positions from −30 to +30 nts relative to the AG are shown. B – Pie chart showing the number of introns in the initial dataset (N = 183187) for which no predictions were obtained (None), no predictions falling inside the 1st AGEZ were obtained (None in AGEZ), the top prediction inside 1st AGEZ has a negative SVM score (Negative scoring) and the top prediction inside the 1st AGEZ scores positively (Positive scoring). C – Histogram showing the distribution of predicted BS distances relative to the 3SS. Only top scoring candidates inside the AGEZ were considered.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2991248&req=5

pcbi-1001016-g006: Predicted human branch points.A – Histogram representing the distribution of BS positions relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). Grey region represents positions that are biased by the presence of the AG dinucleotide. The dashed red line represents the leftmost point where the distribution is different from an expected uniform distribution. The AG dinucleotide exact position is shown on the x-axis. For this plot, top scoring candidates over the last 500nt were considered in order to obtain the left background tail. For visualization purposes only positions from −30 to +30 nts relative to the AG are shown. B – Pie chart showing the number of introns in the initial dataset (N = 183187) for which no predictions were obtained (None), no predictions falling inside the 1st AGEZ were obtained (None in AGEZ), the top prediction inside 1st AGEZ has a negative SVM score (Negative scoring) and the top prediction inside the 1st AGEZ scores positively (Positive scoring). C – Histogram showing the distribution of predicted BS distances relative to the 3SS. Only top scoring candidates inside the AGEZ were considered.

Mentions: Using the SVM classifier, all introns in our human dataset (N = 183187) were scanned for BPs. In order to study the relation between the AGEZ and BP position in more detail, all BP candidates falling in the last 500nt of every intron were scored, regardless of being in the AGEZ or not. For introns shorter than 500nt, the entire intron was scanned. In Figure 6A, we plot the distribution of the BP A position of the best hits per intron relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). We observe that the most frequent location of the BP is inside and towards the 5′ end of the AGEZ. The left-most tail in the distribution reflects the background probability of finding a high scoring BP candidate in all the intron. Interestingly, from 5′ to 3′, the frequency of occurrences increases, starting at a distance of 7–8 nucleotides upstream the AGEZ-defining AG-dinucleotide. This distance is shorter than the 12nt considered when defining the AGEZ (see region r3 in Figure 1). These results suggest that the BP can be most frequently found within the AGEZ and that there is no need to search beyond that. In effect, only in approximately 5% of the introns no candidate was found in the AGEZ. For the remaining 95% we were able to retrieve candidates within the AGEZ, of which approximately 89% score positively (Figure 6B). This percentage drastically drops when considering the next AGEZ upstream of this one (Figure 12A in Text S1), where only in less than 25% of the cases there is a positive hit. When considering the top scoring candidates in the AGEZ (our set of predicted human BPs from this point on), we can observe a distribution bias with approximately 96% of the cases falling between −15 (downstream limit) and −55nt relative to the 3SS with a peak at position −24. However, the distribution extends up to almost the maximum of 500 nt, with ever-diminishing frequencies (Figure 6C). Considering dBPs as predicted BPs that lie beyond 100bp from the 3SS, i.e. 4 times the average 3SS-BP distance, these account for a very small percentage (0.4%, n = 688) of the total predicted BPs (n = 173284). Comparing this set with BPs predicted in the standard range (−55,−15) (Figure 13 in Text S1), we found that dBPs have stronger motif sequences (Mann-Whitney, p = 1.34×10−29). Interestingly, the pyrimidine content between dBP and the 3SS is similar to closely located BPs (Mann-Whitney, p = 0.24), which is surprising considering the large distance. Consequently, PPTs nearby the dBPs are longer and thus have higher score (Mann-Whitney, p≈0). In summary, this leads to higher SVM scores for dBPs (Mann-Whitney, p≈0).


Genome-wide association between branch point properties and alternative splicing.

Corvelo A, Hallegger M, Smith CW, Eyras E - PLoS Comput. Biol. (2010)

Predicted human branch points.A – Histogram representing the distribution of BS positions relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). Grey region represents positions that are biased by the presence of the AG dinucleotide. The dashed red line represents the leftmost point where the distribution is different from an expected uniform distribution. The AG dinucleotide exact position is shown on the x-axis. For this plot, top scoring candidates over the last 500nt were considered in order to obtain the left background tail. For visualization purposes only positions from −30 to +30 nts relative to the AG are shown. B – Pie chart showing the number of introns in the initial dataset (N = 183187) for which no predictions were obtained (None), no predictions falling inside the 1st AGEZ were obtained (None in AGEZ), the top prediction inside 1st AGEZ has a negative SVM score (Negative scoring) and the top prediction inside the 1st AGEZ scores positively (Positive scoring). C – Histogram showing the distribution of predicted BS distances relative to the 3SS. Only top scoring candidates inside the AGEZ were considered.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2991248&req=5

pcbi-1001016-g006: Predicted human branch points.A – Histogram representing the distribution of BS positions relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). Grey region represents positions that are biased by the presence of the AG dinucleotide. The dashed red line represents the leftmost point where the distribution is different from an expected uniform distribution. The AG dinucleotide exact position is shown on the x-axis. For this plot, top scoring candidates over the last 500nt were considered in order to obtain the left background tail. For visualization purposes only positions from −30 to +30 nts relative to the AG are shown. B – Pie chart showing the number of introns in the initial dataset (N = 183187) for which no predictions were obtained (None), no predictions falling inside the 1st AGEZ were obtained (None in AGEZ), the top prediction inside 1st AGEZ has a negative SVM score (Negative scoring) and the top prediction inside the 1st AGEZ scores positively (Positive scoring). C – Histogram showing the distribution of predicted BS distances relative to the 3SS. Only top scoring candidates inside the AGEZ were considered.
Mentions: Using the SVM classifier, all introns in our human dataset (N = 183187) were scanned for BPs. In order to study the relation between the AGEZ and BP position in more detail, all BP candidates falling in the last 500nt of every intron were scored, regardless of being in the AGEZ or not. For introns shorter than 500nt, the entire intron was scanned. In Figure 6A, we plot the distribution of the BP A position of the best hits per intron relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). We observe that the most frequent location of the BP is inside and towards the 5′ end of the AGEZ. The left-most tail in the distribution reflects the background probability of finding a high scoring BP candidate in all the intron. Interestingly, from 5′ to 3′, the frequency of occurrences increases, starting at a distance of 7–8 nucleotides upstream the AGEZ-defining AG-dinucleotide. This distance is shorter than the 12nt considered when defining the AGEZ (see region r3 in Figure 1). These results suggest that the BP can be most frequently found within the AGEZ and that there is no need to search beyond that. In effect, only in approximately 5% of the introns no candidate was found in the AGEZ. For the remaining 95% we were able to retrieve candidates within the AGEZ, of which approximately 89% score positively (Figure 6B). This percentage drastically drops when considering the next AGEZ upstream of this one (Figure 12A in Text S1), where only in less than 25% of the cases there is a positive hit. When considering the top scoring candidates in the AGEZ (our set of predicted human BPs from this point on), we can observe a distribution bias with approximately 96% of the cases falling between −15 (downstream limit) and −55nt relative to the 3SS with a peak at position −24. However, the distribution extends up to almost the maximum of 500 nt, with ever-diminishing frequencies (Figure 6C). Considering dBPs as predicted BPs that lie beyond 100bp from the 3SS, i.e. 4 times the average 3SS-BP distance, these account for a very small percentage (0.4%, n = 688) of the total predicted BPs (n = 173284). Comparing this set with BPs predicted in the standard range (−55,−15) (Figure 13 in Text S1), we found that dBPs have stronger motif sequences (Mann-Whitney, p = 1.34×10−29). Interestingly, the pyrimidine content between dBP and the 3SS is similar to closely located BPs (Mann-Whitney, p = 0.24), which is surprising considering the large distance. Consequently, PPTs nearby the dBPs are longer and thus have higher score (Mann-Whitney, p≈0). In summary, this leads to higher SVM scores for dBPs (Mann-Whitney, p≈0).

Bottom Line: Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human.Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts.The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.

View Article: PubMed Central - PubMed

Affiliation: Computational Genomics, Universitat Pompeu Fabra, Barcelona, Spain.

ABSTRACT
The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3' end of introns, with distance to the 3' splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.

Show MeSH