Limits...
A Comparison of Methods for RNA-Seq Differential Expression Analysis and a New Empirical Bayes Approach.

Wesolowski S, Birtwistle MR, Rempala GA - Biosensors (Basel) (2013)

Bottom Line: Our analysis indicates that DESeq identifies the first half of the differentially expressed transcripts well, but then is outperformed by Cuffdiff and R-EBSeq.Cuffdiff and R-EBSeq are the two top performers.Thus, R-EBSeq offers good performance, while allowing flexible and rigorous comparison of multiple biological conditions.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Florida State University, Tallahassee, FL 32306, USA ; E-Mail: wesserg@gmail.com.

ABSTRACT
Transcriptome-based biosensors are expected to have a large impact on the future of biotechnology. However, a central aspect of transcriptomics is differential expression analysis, where, currently, deep RNA sequencing (RNA-seq) has the potential to replace the microarray as the standard assay for RNA quantification. Our contributions here to RNA-seq differential expression analysis are two-fold. First, given the high cost of an RNA-seq run, biological replicates are rare, and therefore, information sharing across genes to obtain variance estimates is crucial. To handle such information sharing in a rigorous manner, we propose an hierarchical, empirical Bayes approach (R-EBSeq) that combines the Cufflinks model for generating relative transcript abundance measurements, known as FPKM (fragments per kilobase of transcript length per million mapped reads) with the EBArrays framework, which was previously developed for empirical Bayes analysis of microarray data. A desirable feature of R-EBSeq is easy-to-implement analysis of more than pairwise comparisons, as we illustrate with experimental data. Secondly, we develop the standard RNA-seq test data set, on the level of reads, where 79 transcripts are artificially differentially expressed and, therefore, explicitly known. This test data set allows us to compare the performance, in terms of the true discovery rate, of R-EBSeq to three other widely used RNAseq data analysis packages: Cuffdiff, DEseq and BaySeq. Our analysis indicates that DESeq identifies the first half of the differentially expressed transcripts well, but then is outperformed by Cuffdiff and R-EBSeq. Cuffdiff and R-EBSeq are the two top performers. Thus, R-EBSeq offers good performance, while allowing flexible and rigorous comparison of multiple biological conditions.

No MeSH data available.


Relative operating characteristic curves (ROC) for the R-EBSeq Method. The false positive rate (FPR) is plotted vs. the true positive rate (TPR). Test data sets were generated as described in Methods. Effects of (A) the difference of means between two conditions; (B) transcript variance; or (C) the number of replicates, M, on the performance of R-EBSeq.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4263583&req=5

Figure 2: Relative operating characteristic curves (ROC) for the R-EBSeq Method. The false positive rate (FPR) is plotted vs. the true positive rate (TPR). Test data sets were generated as described in Methods. Effects of (A) the difference of means between two conditions; (B) transcript variance; or (C) the number of replicates, M, on the performance of R-EBSeq.

Mentions: Investigated: influence of difference in expression in 10% of transcripts (“treatment” group). Different levels of differential expression were generated according to the following: 5% of transcripts had x mean difference, and the remaining 5% had 2x mean difference in expression from the “control” group. The value of x was varied from 0, 5, 10, …, 50, and these different values yield ten ROCs visible in Figure 2(A).


A Comparison of Methods for RNA-Seq Differential Expression Analysis and a New Empirical Bayes Approach.

Wesolowski S, Birtwistle MR, Rempala GA - Biosensors (Basel) (2013)

Relative operating characteristic curves (ROC) for the R-EBSeq Method. The false positive rate (FPR) is plotted vs. the true positive rate (TPR). Test data sets were generated as described in Methods. Effects of (A) the difference of means between two conditions; (B) transcript variance; or (C) the number of replicates, M, on the performance of R-EBSeq.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4263583&req=5

Figure 2: Relative operating characteristic curves (ROC) for the R-EBSeq Method. The false positive rate (FPR) is plotted vs. the true positive rate (TPR). Test data sets were generated as described in Methods. Effects of (A) the difference of means between two conditions; (B) transcript variance; or (C) the number of replicates, M, on the performance of R-EBSeq.
Mentions: Investigated: influence of difference in expression in 10% of transcripts (“treatment” group). Different levels of differential expression were generated according to the following: 5% of transcripts had x mean difference, and the remaining 5% had 2x mean difference in expression from the “control” group. The value of x was varied from 0, 5, 10, …, 50, and these different values yield ten ROCs visible in Figure 2(A).

Bottom Line: Our analysis indicates that DESeq identifies the first half of the differentially expressed transcripts well, but then is outperformed by Cuffdiff and R-EBSeq.Cuffdiff and R-EBSeq are the two top performers.Thus, R-EBSeq offers good performance, while allowing flexible and rigorous comparison of multiple biological conditions.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics, Florida State University, Tallahassee, FL 32306, USA ; E-Mail: wesserg@gmail.com.

ABSTRACT
Transcriptome-based biosensors are expected to have a large impact on the future of biotechnology. However, a central aspect of transcriptomics is differential expression analysis, where, currently, deep RNA sequencing (RNA-seq) has the potential to replace the microarray as the standard assay for RNA quantification. Our contributions here to RNA-seq differential expression analysis are two-fold. First, given the high cost of an RNA-seq run, biological replicates are rare, and therefore, information sharing across genes to obtain variance estimates is crucial. To handle such information sharing in a rigorous manner, we propose an hierarchical, empirical Bayes approach (R-EBSeq) that combines the Cufflinks model for generating relative transcript abundance measurements, known as FPKM (fragments per kilobase of transcript length per million mapped reads) with the EBArrays framework, which was previously developed for empirical Bayes analysis of microarray data. A desirable feature of R-EBSeq is easy-to-implement analysis of more than pairwise comparisons, as we illustrate with experimental data. Secondly, we develop the standard RNA-seq test data set, on the level of reads, where 79 transcripts are artificially differentially expressed and, therefore, explicitly known. This test data set allows us to compare the performance, in terms of the true discovery rate, of R-EBSeq to three other widely used RNAseq data analysis packages: Cuffdiff, DEseq and BaySeq. Our analysis indicates that DESeq identifies the first half of the differentially expressed transcripts well, but then is outperformed by Cuffdiff and R-EBSeq. Cuffdiff and R-EBSeq are the two top performers. Thus, R-EBSeq offers good performance, while allowing flexible and rigorous comparison of multiple biological conditions.

No MeSH data available.