Robustly detecting differential expression in RNA sequencing data using observation weights.
Bottom Line: Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects).The results suggest that outliers can have a global effect on differential analyses.We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods.
Affiliation: Institute of Molecular Life Sciences, University of Zurich, CH-8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland.Show MeSH
Related in: MedlinePlus
Mentions: Figure 4 shows the set of standard metrics: panels (a)–(c) and (d)–(f) show false discovery plots, ROC curves and power numbers, respectively, for the original and original-with-outliers datasets under the setting of simulation parameters discussed above. Overall, the introduction of outliers results in more false positives (Figure 4a versus d) and/or less true positives at the same false positive rate (Figure 4b versus e). In the absence of outliers, all methods exhibit similar patterns of false discovery rates, with the Bayesian methods, ShrinkBayes and EBSeq having a slightly higher rate. Similarly, in terms of separating the truly DE from non-DE features using a P-value (or P-value-like score in the case of Bayesian methods), all methods are very close in performance. Furthermore, in the absence of outliers, edgeR, edgeR-robust and DESeq2 appear to have a slight edge in power at the method's 5% FDR, albeit the advantage is small (Figure 4c). When outliers are introduced, edgeR-robust shows some advantages over edgeR. In terms of statistical power, all methods drop in overall power with the introduction of outliers (Figure 4c versus f), while DESeq exhibits a spectacular drop. Notably, DESeq still maintains a good ranking of P-values (Figure 4f), but becomes very conservative due to the maximum-of-trend-and-individual dispersion policy; in this respect, presence of outliers affect the whole dataset (see Supplementary Figure S7).
Affiliation: Institute of Molecular Life Sciences, University of Zurich, CH-8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland.