Limits...
Non-sequential and multi-step splicing of the dystrophin transcript

View Article: PubMed Central - PubMed

ABSTRACT

The dystrophin protein encoding DMD gene is the longest human gene. The 2.2 Mb long human dystrophin transcript takes 16 hours to be transcribed and is co-transcriptionally spliced. It contains long introns (24 over 10kb long, 5 over 100kb long) and the heterogeneity in intron size makes it an ideal transcript to study different aspects of the human splicing process. Splicing is a complex process and much is unknown regarding the splicing of long introns in human genes.

Here, we used ultra-deep transcript sequencing to characterize splicing of the dystrophin transcripts in 3 different human skeletal muscle cell lines, and explored the order of intron removal and multi-step splicing. Coverage and read pair analyses showed that around 40% of the introns were not always removed sequentially. Additionally, for the first time, we report that non-consecutive intron removal resulted in 3 or more joined exons which are flanked by unspliced introns and we defined these joined exons as an exon block. Lastly, computational and experimental data revealed that, for the majority of dystrophin introns, multistep splicing events are used to splice out a single intron.

Overall, our data show for the first time in a human transcript, that multi-step intron removal is a general feature of mRNA splicing.

No MeSH data available.


Scatter plot of the average intron coverage (y-axis) vs. the splice-ratio (x-axis) of each intron. An inverse correlation between the 2 methods is observed (r =− 0.32, p-value = 0.0043): lower coverage (relatively fast splicing) is associated with a higher splice-ratio, indicative of sequential splicing.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4829307&req=5

f0003: Scatter plot of the average intron coverage (y-axis) vs. the splice-ratio (x-axis) of each intron. An inverse correlation between the 2 methods is observed (r =− 0.32, p-value = 0.0043): lower coverage (relatively fast splicing) is associated with a higher splice-ratio, indicative of sequential splicing.

Mentions: We reasoned that the intronic coverage would correlate with relative speed of intron removal, i.e., introns that are spliced out quickly are expected to show low coverage, while introns that are spliced out slowly are expected to show higher coverage. Since there is a large variation in the length of introns in dystrophin transcript, we first addressed whether the coverage was proportional to the intron length. We defined intronic length as the amount of nucleotides covered by the probes and then subtracted sequences containing annotated promoters, UTRs, micro-RNAs for each intron and assessed the read density of the remaining intronic sequences. No significant correlation between intron length and coverage (Fig. 2C) indicating that short introns are not spliced before long introns. Rather, these results suggested that the introns are non-sequentially spliced. Therefore, some introns may be removed only after downstream introns have been removed and the splicing does not follow a strict 5′-3′ order. Nevertheless, since transcription of the complete dystrophin transcript takes ∼16 hours, it is likely that a very slowly spliced upstream intron is spliced out before a very quickly spliced intron further downstream, simply because the downstream intron is produced hours later than the upstream intron. Therefore, we analyzed the relative order of intron removal in groups of 5 introns, using a sliding window of 3. For every group of 5 introns, each intron was classified as fast, intermediate or slow. A low depth of coverage may represent quickly spliced introns (normalized coverage <90), while a higher depth (normalized coverage >130) may reflect slow splicing. A small group of introns with coverage between 90 and 130 were defined as ‘intermediate’. The classification of introns was very similar for each of the 3 cell lines showing a strong indications that several downstream introns were removed before upstream introns, and as a consequence of this, blocks of exons that were flanked by slowly spliced introns were identified. Considering that intronic depth could also account for reads spanning putative lariats and non-annotated pseudoexons, we used a second analysis method to confirm out findings, using paired end analysis, where one read spans an exon-exon junction and the second read falls in an intron, thus excluding reads spanning lariat forms. Sequential and non-sequential splicing events were thus corroborated by the analysis of paired-end reads from the intermediate-splicing category. To determine the nature of splicing of each intron, we considered intron (n) as a starting point. If intron (n) is spliced sequentially (S), it would be spliced before intron (n+1), leading to read pairs where one end would cover the ex-ex junction (ex(n)-ex(n+1)) and the other read would align to the flanking downstream intron (n+1) (Fig. S3A). Alternatively, a non-sequential (NS) splicing would result in the splicing of intron (n+1) before intron (n). This would be reflected by paired-end reads in intron (n) and in the exon-exon junction of the 2 exons immediately downstream of intron (n), (ex(n+1)-ex(n+2)) implying the presence of an unspliced intron. We defined the splice-ratio for any given intron as the number of reads suggestive of sequential splicing, divided by the sum of the reads suggestive of sequential splicing and those reads suggestive of non-sequential splicing. Intron were classified as being sequentially spliced when the splice-ratio was between 0.5 and 1, while introns with a splice-ratio below 0.5 were classified as non-sequential. For five introns out of 78, splice-ratios were slightly above or below 0.5, and classified as intermediate. We discovered high pair wise correlation between the splice ratios in pre-mRNA samples between the 3 cell lines, after calculating Pearson correlation coefficient: 0.82 between cell lines KM155 and 8220; 0.85 between cell lines KM155 and 7304; 0.82 between cell lines 8220 and 7304. We also observed a correlation between the intron coverage and the splice-ratio values (Fig. 3), where introns classified as non-sequential based on the splice-ratio showed higher coverage (indicative of slower splicing) than introns classified as sequential (r=−0.32, p-value=0.0043). The fact that the intron coverage analysis may also have included excised lariats, while the paired-end analysis does not, may have prevented the correlation from being better than it is now.Figure 3.


Non-sequential and multi-step splicing of the dystrophin transcript
Scatter plot of the average intron coverage (y-axis) vs. the splice-ratio (x-axis) of each intron. An inverse correlation between the 2 methods is observed (r =− 0.32, p-value = 0.0043): lower coverage (relatively fast splicing) is associated with a higher splice-ratio, indicative of sequential splicing.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4829307&req=5

f0003: Scatter plot of the average intron coverage (y-axis) vs. the splice-ratio (x-axis) of each intron. An inverse correlation between the 2 methods is observed (r =− 0.32, p-value = 0.0043): lower coverage (relatively fast splicing) is associated with a higher splice-ratio, indicative of sequential splicing.
Mentions: We reasoned that the intronic coverage would correlate with relative speed of intron removal, i.e., introns that are spliced out quickly are expected to show low coverage, while introns that are spliced out slowly are expected to show higher coverage. Since there is a large variation in the length of introns in dystrophin transcript, we first addressed whether the coverage was proportional to the intron length. We defined intronic length as the amount of nucleotides covered by the probes and then subtracted sequences containing annotated promoters, UTRs, micro-RNAs for each intron and assessed the read density of the remaining intronic sequences. No significant correlation between intron length and coverage (Fig. 2C) indicating that short introns are not spliced before long introns. Rather, these results suggested that the introns are non-sequentially spliced. Therefore, some introns may be removed only after downstream introns have been removed and the splicing does not follow a strict 5′-3′ order. Nevertheless, since transcription of the complete dystrophin transcript takes ∼16 hours, it is likely that a very slowly spliced upstream intron is spliced out before a very quickly spliced intron further downstream, simply because the downstream intron is produced hours later than the upstream intron. Therefore, we analyzed the relative order of intron removal in groups of 5 introns, using a sliding window of 3. For every group of 5 introns, each intron was classified as fast, intermediate or slow. A low depth of coverage may represent quickly spliced introns (normalized coverage <90), while a higher depth (normalized coverage >130) may reflect slow splicing. A small group of introns with coverage between 90 and 130 were defined as ‘intermediate’. The classification of introns was very similar for each of the 3 cell lines showing a strong indications that several downstream introns were removed before upstream introns, and as a consequence of this, blocks of exons that were flanked by slowly spliced introns were identified. Considering that intronic depth could also account for reads spanning putative lariats and non-annotated pseudoexons, we used a second analysis method to confirm out findings, using paired end analysis, where one read spans an exon-exon junction and the second read falls in an intron, thus excluding reads spanning lariat forms. Sequential and non-sequential splicing events were thus corroborated by the analysis of paired-end reads from the intermediate-splicing category. To determine the nature of splicing of each intron, we considered intron (n) as a starting point. If intron (n) is spliced sequentially (S), it would be spliced before intron (n+1), leading to read pairs where one end would cover the ex-ex junction (ex(n)-ex(n+1)) and the other read would align to the flanking downstream intron (n+1) (Fig. S3A). Alternatively, a non-sequential (NS) splicing would result in the splicing of intron (n+1) before intron (n). This would be reflected by paired-end reads in intron (n) and in the exon-exon junction of the 2 exons immediately downstream of intron (n), (ex(n+1)-ex(n+2)) implying the presence of an unspliced intron. We defined the splice-ratio for any given intron as the number of reads suggestive of sequential splicing, divided by the sum of the reads suggestive of sequential splicing and those reads suggestive of non-sequential splicing. Intron were classified as being sequentially spliced when the splice-ratio was between 0.5 and 1, while introns with a splice-ratio below 0.5 were classified as non-sequential. For five introns out of 78, splice-ratios were slightly above or below 0.5, and classified as intermediate. We discovered high pair wise correlation between the splice ratios in pre-mRNA samples between the 3 cell lines, after calculating Pearson correlation coefficient: 0.82 between cell lines KM155 and 8220; 0.85 between cell lines KM155 and 7304; 0.82 between cell lines 8220 and 7304. We also observed a correlation between the intron coverage and the splice-ratio values (Fig. 3), where introns classified as non-sequential based on the splice-ratio showed higher coverage (indicative of slower splicing) than introns classified as sequential (r=−0.32, p-value=0.0043). The fact that the intron coverage analysis may also have included excised lariats, while the paired-end analysis does not, may have prevented the correlation from being better than it is now.Figure 3.

View Article: PubMed Central - PubMed

ABSTRACT

The dystrophin protein encoding DMD gene is the longest human gene. The 2.2&nbsp;Mb long human dystrophin transcript takes 16&nbsp;hours to be transcribed and is co-transcriptionally spliced. It contains long introns (24 over 10kb long, 5 over 100kb long) and the heterogeneity in intron size makes it an ideal transcript to study different aspects of the human splicing process. Splicing is a complex process and much is unknown regarding the splicing of long introns in human genes.

Here, we used ultra-deep transcript sequencing to characterize splicing of the dystrophin transcripts in 3 different human skeletal muscle cell lines, and explored the order of intron removal and multi-step splicing. Coverage and read pair analyses showed that around 40% of the introns were not always removed sequentially. Additionally, for the first time, we report that non-consecutive intron removal resulted in 3 or more joined exons which are flanked by unspliced introns and we defined these joined exons as an exon block. Lastly, computational and experimental data revealed that, for the majority of dystrophin introns, multistep splicing events are used to splice out a single intron.

Overall, our data show for the first time in a human transcript, that multi-step intron removal is a general feature of mRNA splicing.

No MeSH data available.