Limits...
Global and unbiased detection of splice junctions from RNA-seq data.

Ameur A, Wetterbom A, Feuk L, Gyllensten U - Genome Biol. (2010)

Bottom Line: We have developed a new strategy for de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts.When tested on mouse RNA-seq data, >31,000 splice events were predicted, of which 88% bridged between two regions separated by <or=100 kb, and 74% connected two exons of the same RefSeq gene.Our method also reports genomic rearrangements such as insertions and deletions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Pathology, Rudbeck laboratory, Uppsala University, Uppsala, Sweden. adam.ameur@genpat.uu.se

ABSTRACT
We have developed a new strategy for de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts. When tested on mouse RNA-seq data, >31,000 splice events were predicted, of which 88% bridged between two regions separated by

Show MeSH
Comparison of predictions from RNA-MATE and SplitSeek. (a) Venn diagram showing the number of predicted junctions by the two methods. (b) Predicted number of junction reads for all for all 11,395 exon boundaries reported by both RNA-MATE (x-axis) and SplitSeek (y-axis).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2864574&req=5

Figure 2: Comparison of predictions from RNA-MATE and SplitSeek. (a) Venn diagram showing the number of predicted junctions by the two methods. (b) Predicted number of junction reads for all for all 11,395 exon boundaries reported by both RNA-MATE (x-axis) and SplitSeek (y-axis).

Mentions: The SplitSeek predictions show high specificity, but we were also interested to evaluate the sensitivity. Therefore, we compared the SplitSeek results with RNA-MATE [5], a method that recursively maps reads to a junction library of known exons. By applying the RNA-MATE program to the oocyte1 dataset (see Methods for details), we found 20,562 exon boundaries supported by at least two reads, slightly more than the 17,397 junctions predicted by SplitSeek (see Table 2). As shown in Figure 2a, 11,395 splice junctions were detected in common, meaning that SplitSeek confirms 55% of the RNA-MATE predictions. There could be several possible reasons that the remaining 45% are not detected by SplitSeek and we believe it is due to a combination of (a) junctions at which no read is centered over the boundary and thereby is undetectable by SplitSeek; (b) junctions uniquely mappable when using an exon-junction library but not with the anchor-extend alignment; and (c) junctions falsely detected by RNA-MATE. Of the SplitSeek boundaries, 6,420 were not found by RNA-MATE, and 1,007 (16%) of these were long-range splicings of ≥100 kb, a number that could be indicative of the false-positive rate among the junctions predicted only by SplitSeek. Interestingly, as many as 4,069 (63%) of the 6,420 SplitSeek-only predictions coincide with RefSeq exon boundaries. These can be explained partly by the fact that the RNA-MATE library was not completely up to date (see Methods), but as many as 2,519 of these junctions were present in the library file, which demonstrates that a substantial number of splice events are detectable only by SplitSeek. However, a large number of exon boundaries were reported by both methods, and for these, we could see a clear correlation in the number of reads predicted to cover the junctions (see Figure 2b). The scatterplot shows a systematic bias toward more reads/junction for SplitSeek, probably because SplitSeek can use reads in which only five nucleotides are sequenced from the other exon, whereas this overhang must be longer for library-based methods. A peculiar observation is a group of points in the upper left corner, with many reads for SplitSeek and few for RNA-MATE. We think that these largely represent cases in which RNA-MATE predicts two or more highly similar splice events located only a few bases apart, whereas SplitSeek groups them into one single junction. In such cases, the RNA-MATE junctions, each with varying number of reads, will be compared with one single SplitSeek prediction based on all junction reads, and consequently, some of the points might end in the top-left corner of Figure 2b. However, it remains unclear whether these highly similar junctions reflect real splicing events or if they are artifacts from the library construction and mapping procedures. In conclusion, this comparison suggests that junction library-based methods and SplitSeek can complement each other to detect more splice variants in known genes.


Global and unbiased detection of splice junctions from RNA-seq data.

Ameur A, Wetterbom A, Feuk L, Gyllensten U - Genome Biol. (2010)

Comparison of predictions from RNA-MATE and SplitSeek. (a) Venn diagram showing the number of predicted junctions by the two methods. (b) Predicted number of junction reads for all for all 11,395 exon boundaries reported by both RNA-MATE (x-axis) and SplitSeek (y-axis).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2864574&req=5

Figure 2: Comparison of predictions from RNA-MATE and SplitSeek. (a) Venn diagram showing the number of predicted junctions by the two methods. (b) Predicted number of junction reads for all for all 11,395 exon boundaries reported by both RNA-MATE (x-axis) and SplitSeek (y-axis).
Mentions: The SplitSeek predictions show high specificity, but we were also interested to evaluate the sensitivity. Therefore, we compared the SplitSeek results with RNA-MATE [5], a method that recursively maps reads to a junction library of known exons. By applying the RNA-MATE program to the oocyte1 dataset (see Methods for details), we found 20,562 exon boundaries supported by at least two reads, slightly more than the 17,397 junctions predicted by SplitSeek (see Table 2). As shown in Figure 2a, 11,395 splice junctions were detected in common, meaning that SplitSeek confirms 55% of the RNA-MATE predictions. There could be several possible reasons that the remaining 45% are not detected by SplitSeek and we believe it is due to a combination of (a) junctions at which no read is centered over the boundary and thereby is undetectable by SplitSeek; (b) junctions uniquely mappable when using an exon-junction library but not with the anchor-extend alignment; and (c) junctions falsely detected by RNA-MATE. Of the SplitSeek boundaries, 6,420 were not found by RNA-MATE, and 1,007 (16%) of these were long-range splicings of ≥100 kb, a number that could be indicative of the false-positive rate among the junctions predicted only by SplitSeek. Interestingly, as many as 4,069 (63%) of the 6,420 SplitSeek-only predictions coincide with RefSeq exon boundaries. These can be explained partly by the fact that the RNA-MATE library was not completely up to date (see Methods), but as many as 2,519 of these junctions were present in the library file, which demonstrates that a substantial number of splice events are detectable only by SplitSeek. However, a large number of exon boundaries were reported by both methods, and for these, we could see a clear correlation in the number of reads predicted to cover the junctions (see Figure 2b). The scatterplot shows a systematic bias toward more reads/junction for SplitSeek, probably because SplitSeek can use reads in which only five nucleotides are sequenced from the other exon, whereas this overhang must be longer for library-based methods. A peculiar observation is a group of points in the upper left corner, with many reads for SplitSeek and few for RNA-MATE. We think that these largely represent cases in which RNA-MATE predicts two or more highly similar splice events located only a few bases apart, whereas SplitSeek groups them into one single junction. In such cases, the RNA-MATE junctions, each with varying number of reads, will be compared with one single SplitSeek prediction based on all junction reads, and consequently, some of the points might end in the top-left corner of Figure 2b. However, it remains unclear whether these highly similar junctions reflect real splicing events or if they are artifacts from the library construction and mapping procedures. In conclusion, this comparison suggests that junction library-based methods and SplitSeek can complement each other to detect more splice variants in known genes.

Bottom Line: We have developed a new strategy for de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts.When tested on mouse RNA-seq data, >31,000 splice events were predicted, of which 88% bridged between two regions separated by <or=100 kb, and 74% connected two exons of the same RefSeq gene.Our method also reports genomic rearrangements such as insertions and deletions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics and Pathology, Rudbeck laboratory, Uppsala University, Uppsala, Sweden. adam.ameur@genpat.uu.se

ABSTRACT
We have developed a new strategy for de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts. When tested on mouse RNA-seq data, >31,000 splice events were predicted, of which 88% bridged between two regions separated by

Show MeSH