Limits...
A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level.

Zheng S, Chen L - Nucleic Acids Res. (2009)

Bottom Line: Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler.We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set.Some of the predictions were validated by quantitative real-time RT-PCR experiments.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA.

ABSTRACT
The complexity of mammalian transcriptomes is compounded by alternative splicing which allows one gene to produce multiple transcript isoforms. However, transcriptome comparison has been limited to differential analysis at the gene level instead of the individual transcript isoform level. High-throughput sequencing technologies and high-resolution tiling arrays provide an unprecedented opportunity to compare transcriptomes at the level of individual splice variants. However, sequence read coverage or probe intensity at each position may represent a family of splice variants instead of one single isoform. Here we propose a hierarchical Bayesian model, BASIS (Bayesian Analysis of Splicing IsoformS), to infer the differential expression level of each transcript isoform in response to two conditions. A latent variable was introduced to perform direct statistical selection of differentially expressed isoforms. Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler. BASIS has the ability to borrow information across different probes (or positions) from the same genes and different genes. BASIS can handle the heteroskedasticity of probe intensity or sequence read coverage. We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set. Some of the predictions were validated by quantitative real-time RT-PCR experiments.

Show MeSH
Power of BASIS and the least squares fit for 100 different matrix Es. The powers were calculated based on 1000 simulations on 100 genes. The total false-positive rate was controlled at 0.005.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2691848&req=5

Figure 4: Power of BASIS and the least squares fit for 100 different matrix Es. The powers were calculated based on 1000 simulations on 100 genes. The total false-positive rate was controlled at 0.005.

Mentions: Besides the purely simulated probe arrangement matrix E [shown in (*)] for genes with differentially expressed isoforms, we also tested another 100 different probe arrangement matrix E's randomly drawn from the real data (genes in the human data and with five isoforms). For each matrix E, the same simulation settings as mentioned in ‘Materials and Methods’ section were preformed: nine genes with differentially expressed isoform were simulated and there were another 91 non-differentially expressed genes. The overall power of BASIS and the least squares fit for the 100 genes were calculated based on 1000 simulations for each E. As shown in Figure 4, BASIS consistently performs better than the least squares fit. There is about 2-fold increase in the power of BASIS most of time. The results also indicate that the gene annotation structure (E) will affect the power of BASIS. Specifically, if a gene has more probes (or positions), thus the number of rows of E(n) is larger; the power of BASIS is larger. The Pearson correlation between n and the power is 0.34 which is significant with a P-value of 0.0005. The correlation was calculated based on the 100 different E's. In addition, if the difference among isoforms is larger, the power of BASIS is larger. Here the difference among isoforms was measured as the average Manhattan distances among isoforms (i.e. among columns of E) divided by n. The Pearson correlation between the difference measure and the power of BASIS is 0.38 with a P-value of 0.0001. Finally, BASIS does not rely on the percentage of isoform-specific positions of a gene. For each E, we calculated the percentage of positions which appear in only one isoform. The power of BASIS is not related to the percentage of isoform-specific positions with a P-value of 0.29. This is because BASIS considers the joint behavior of probes targeting on the same gene.Figure 4.


A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level.

Zheng S, Chen L - Nucleic Acids Res. (2009)

Power of BASIS and the least squares fit for 100 different matrix Es. The powers were calculated based on 1000 simulations on 100 genes. The total false-positive rate was controlled at 0.005.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2691848&req=5

Figure 4: Power of BASIS and the least squares fit for 100 different matrix Es. The powers were calculated based on 1000 simulations on 100 genes. The total false-positive rate was controlled at 0.005.
Mentions: Besides the purely simulated probe arrangement matrix E [shown in (*)] for genes with differentially expressed isoforms, we also tested another 100 different probe arrangement matrix E's randomly drawn from the real data (genes in the human data and with five isoforms). For each matrix E, the same simulation settings as mentioned in ‘Materials and Methods’ section were preformed: nine genes with differentially expressed isoform were simulated and there were another 91 non-differentially expressed genes. The overall power of BASIS and the least squares fit for the 100 genes were calculated based on 1000 simulations for each E. As shown in Figure 4, BASIS consistently performs better than the least squares fit. There is about 2-fold increase in the power of BASIS most of time. The results also indicate that the gene annotation structure (E) will affect the power of BASIS. Specifically, if a gene has more probes (or positions), thus the number of rows of E(n) is larger; the power of BASIS is larger. The Pearson correlation between n and the power is 0.34 which is significant with a P-value of 0.0005. The correlation was calculated based on the 100 different E's. In addition, if the difference among isoforms is larger, the power of BASIS is larger. Here the difference among isoforms was measured as the average Manhattan distances among isoforms (i.e. among columns of E) divided by n. The Pearson correlation between the difference measure and the power of BASIS is 0.38 with a P-value of 0.0001. Finally, BASIS does not rely on the percentage of isoform-specific positions of a gene. For each E, we calculated the percentage of positions which appear in only one isoform. The power of BASIS is not related to the percentage of isoform-specific positions with a P-value of 0.29. This is because BASIS considers the joint behavior of probes targeting on the same gene.Figure 4.

Bottom Line: Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler.We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set.Some of the predictions were validated by quantitative real-time RT-PCR experiments.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA.

ABSTRACT
The complexity of mammalian transcriptomes is compounded by alternative splicing which allows one gene to produce multiple transcript isoforms. However, transcriptome comparison has been limited to differential analysis at the gene level instead of the individual transcript isoform level. High-throughput sequencing technologies and high-resolution tiling arrays provide an unprecedented opportunity to compare transcriptomes at the level of individual splice variants. However, sequence read coverage or probe intensity at each position may represent a family of splice variants instead of one single isoform. Here we propose a hierarchical Bayesian model, BASIS (Bayesian Analysis of Splicing IsoformS), to infer the differential expression level of each transcript isoform in response to two conditions. A latent variable was introduced to perform direct statistical selection of differentially expressed isoforms. Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler. BASIS has the ability to borrow information across different probes (or positions) from the same genes and different genes. BASIS can handle the heteroskedasticity of probe intensity or sequence read coverage. We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set. Some of the predictions were validated by quantitative real-time RT-PCR experiments.

Show MeSH