Limits...
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs.

LeGault LH, Dewey CN - Bioinformatics (2013)

Bottom Line: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell.RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences.We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Sciences, University of Wisconsin, Madison, WI 53706, USA.

ABSTRACT

Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues.

Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate.

Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer.

Contact: cdewey@biostat.wisc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Distributions of the differences between the parameter estimates of EM and JR from single and paired-end data
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753571&req=5

btt396-F3: Distributions of the differences between the parameter estimates of EM and JR from single and paired-end data

Mentions: The line PSG parameters for each of these genes were estimated using JR and EM. For each vertex with outdegree , we computed the distance between the probabilities of its out-edges by taking the maximum of the absolute difference between the estimates on each edge (infinity norm). Figure 3 gives the distributions of these distances between EM and JR estimates for both single and paired-end reads (Supplementary Fig. S4 gives the plots for comparisons between estimates from the same method on single and paired-end reads). We also examined how often the estimates at each vertex agreed in terms of which AP event following that vertex was most likely. EM and JR agreed with respect to this measure on 84 and 81% of the vertices for single and paired-end estimates, respectively. The single-end and paired-end estimates agreed with each other on 95 and 93% of vertices for EM and JR, respectively. These results indicate that the EM estimates are highly accurate on average, assuming that the JR estimates are close to the truth. This suggests that the model assumptions used by EM are reasonable, at least on this dataset. The differences observed between the estimates of the same method on single and paired-end data show that many of the discrepancies between the methods arise simply because they draw information from different subsets of the data. The remaining discrepancies may be the result of highly biased read distributions or incorrectly annotated gene structures.Fig. 3.


Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs.

LeGault LH, Dewey CN - Bioinformatics (2013)

Distributions of the differences between the parameter estimates of EM and JR from single and paired-end data
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753571&req=5

btt396-F3: Distributions of the differences between the parameter estimates of EM and JR from single and paired-end data
Mentions: The line PSG parameters for each of these genes were estimated using JR and EM. For each vertex with outdegree , we computed the distance between the probabilities of its out-edges by taking the maximum of the absolute difference between the estimates on each edge (infinity norm). Figure 3 gives the distributions of these distances between EM and JR estimates for both single and paired-end reads (Supplementary Fig. S4 gives the plots for comparisons between estimates from the same method on single and paired-end reads). We also examined how often the estimates at each vertex agreed in terms of which AP event following that vertex was most likely. EM and JR agreed with respect to this measure on 84 and 81% of the vertices for single and paired-end estimates, respectively. The single-end and paired-end estimates agreed with each other on 95 and 93% of vertices for EM and JR, respectively. These results indicate that the EM estimates are highly accurate on average, assuming that the JR estimates are close to the truth. This suggests that the model assumptions used by EM are reasonable, at least on this dataset. The differences observed between the estimates of the same method on single and paired-end data show that many of the discrepancies between the methods arise simply because they draw information from different subsets of the data. The remaining discrepancies may be the result of highly biased read distributions or incorrectly annotated gene structures.Fig. 3.

Bottom Line: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell.RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences.We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Sciences, University of Wisconsin, Madison, WI 53706, USA.

ABSTRACT

Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues.

Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate.

Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer.

Contact: cdewey@biostat.wisc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH