Robustly detecting differential expression in RNA sequencing data using observation weights.
Bottom Line: Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks.The results suggest that outliers can have a global effect on differential analyses.In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment.
Affiliation: Institute of Molecular Life Sciences, University of Zurich, CH-8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland.Show MeSH
Related in: MedlinePlus
Mentions: Figure 4 shows the set of standard metrics: panels (a)–(c) and (d)–(f) show false discovery plots, ROC curves and power numbers, respectively, for the original and original-with-outliers datasets under the setting of simulation parameters discussed above. Overall, the introduction of outliers results in more false positives (Figure 4a versus d) and/or less true positives at the same false positive rate (Figure 4b versus e). In the absence of outliers, all methods exhibit similar patterns of false discovery rates, with the Bayesian methods, ShrinkBayes and EBSeq having a slightly higher rate. Similarly, in terms of separating the truly DE from non-DE features using a P-value (or P-value-like score in the case of Bayesian methods), all methods are very close in performance. Furthermore, in the absence of outliers, edgeR, edgeR-robust and DESeq2 appear to have a slight edge in power at the method's 5% FDR, albeit the advantage is small (Figure 4c). When outliers are introduced, edgeR-robust shows some advantages over edgeR. In terms of statistical power, all methods drop in overall power with the introduction of outliers (Figure 4c versus f), while DESeq exhibits a spectacular drop. Notably, DESeq still maintains a good ranking of P-values (Figure 4f), but becomes very conservative due to the maximum-of-trend-and-individual dispersion policy; in this respect, presence of outliers affect the whole dataset (see Supplementary Figure S7).
Affiliation: Institute of Molecular Life Sciences, University of Zurich, CH-8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland.