Limits...
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies.

Chu C, Fang Z, Hua X, Yang Y, Chen E, Cowley AW, Liang M, Liu P, Lu Y - BMC Genomics (2015)

Bottom Line: The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost.To address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data.We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, 230026, China. chenchu@mcw.edu.

ABSTRACT

Background: The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. RNA-Seq provides a far more precise measurement of transcript levels and their isoforms compared to other methods such as microarrays. A fundamental goal of RNA-Seq is to better identify expression changes between different biological or disease conditions. However, existing methods for detecting differential expression from RNA-Seq count data have not been comprehensively evaluated in large-scale RNA-Seq datasets. Many of them suffer from inflation of type I error and failure in controlling false discovery rate especially in the presence of abnormal high sequence read counts in RNA-Seq experiments.

Results: To address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data. This framework contains new normalization methods based on generalized Poisson distribution modeling sequence count data, followed by permutation-based differential expression tests. We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.

Conclusions: Software implementing our deGPS was released within an R package with parallel computations ( https://github.com/LL-LAB-MCW/deGPS ). deGPS is a powerful and robust tool for data normalization and detecting different expression in RNA-Seq experiments. Beyond RNA-Seq, deGPS has the potential to significantly enhance future data analysis efforts from many other high-throughput platforms such as ChIP-Seq, MBD-Seq and RIP-Seq.

No MeSH data available.


Related in: MedlinePlus

Overview of deGPS for analyzing sequence count data in RNA-Seq
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4465298&req=5

Fig1: Overview of deGPS for analyzing sequence count data in RNA-Seq

Mentions: To identify biologically important changes in RNA expression, we propose a more accurate and sensitive two-step method for analyzing sequence count data from RNA-Seq experiments (Fig. 1). Here, we implement our method in an R statistical package, termed “deGPS” (https://github.com/LL-LAB-MCW). To speed up permutation tests, deGPS also provides efficient parallel computation using multi-core processors. In Step 1, two different methods based on the GP distribution, namely GP-Quantile and GP-Theta, were developed for normalizing sequence count data. These two GP-based methods differ in parameter estimation and data transformation. Generally, GP distributions fit sequence count data better than NB distributions on transcripts over a wide range of relative abundance in RNA-Seq experiments (Fig. 2). Other commonly used normalization methods including global, quantile [17], locally weighted least squares (Lowess) [18], and trimmed mean method (TMM) [19] for high-throughput data, as is used for microarrays, can be also adopted in deGPS. The latter normalization methods are based on either linear scaling or sample quantiles instead of modeling sequence count data. Normalization in Step 1 removes potential technical artifacts arising from unintended noise, while maintaining the true differences between biological samples.Fig. 1


deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies.

Chu C, Fang Z, Hua X, Yang Y, Chen E, Cowley AW, Liang M, Liu P, Lu Y - BMC Genomics (2015)

Overview of deGPS for analyzing sequence count data in RNA-Seq
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4465298&req=5

Fig1: Overview of deGPS for analyzing sequence count data in RNA-Seq
Mentions: To identify biologically important changes in RNA expression, we propose a more accurate and sensitive two-step method for analyzing sequence count data from RNA-Seq experiments (Fig. 1). Here, we implement our method in an R statistical package, termed “deGPS” (https://github.com/LL-LAB-MCW). To speed up permutation tests, deGPS also provides efficient parallel computation using multi-core processors. In Step 1, two different methods based on the GP distribution, namely GP-Quantile and GP-Theta, were developed for normalizing sequence count data. These two GP-based methods differ in parameter estimation and data transformation. Generally, GP distributions fit sequence count data better than NB distributions on transcripts over a wide range of relative abundance in RNA-Seq experiments (Fig. 2). Other commonly used normalization methods including global, quantile [17], locally weighted least squares (Lowess) [18], and trimmed mean method (TMM) [19] for high-throughput data, as is used for microarrays, can be also adopted in deGPS. The latter normalization methods are based on either linear scaling or sample quantiles instead of modeling sequence count data. Normalization in Step 1 removes potential technical artifacts arising from unintended noise, while maintaining the true differences between biological samples.Fig. 1

Bottom Line: The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost.To address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data.We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, 230026, China. chenchu@mcw.edu.

ABSTRACT

Background: The advent of the NGS technologies has permitted profiling of whole-genome transcriptomes (i.e., RNA-Seq) at unprecedented speed and very low cost. RNA-Seq provides a far more precise measurement of transcript levels and their isoforms compared to other methods such as microarrays. A fundamental goal of RNA-Seq is to better identify expression changes between different biological or disease conditions. However, existing methods for detecting differential expression from RNA-Seq count data have not been comprehensively evaluated in large-scale RNA-Seq datasets. Many of them suffer from inflation of type I error and failure in controlling false discovery rate especially in the presence of abnormal high sequence read counts in RNA-Seq experiments.

Results: To address these challenges, we propose a powerful and robust tool, termed deGPS, for detecting differential expression in RNA-Seq data. This framework contains new normalization methods based on generalized Poisson distribution modeling sequence count data, followed by permutation-based differential expression tests. We systematically evaluated our new tool in simulated datasets from several large-scale TCGA RNA-Seq projects, unbiased benchmark data from compcodeR package, and real RNA-Seq data from the development transcriptome of Drosophila. deGPS can precisely control type I error and false discovery rate for the detection of differential expression and is robust in the presence of abnormal high sequence read counts in RNA-Seq experiments.

Conclusions: Software implementing our deGPS was released within an R package with parallel computations ( https://github.com/LL-LAB-MCW/deGPS ). deGPS is a powerful and robust tool for data normalization and detecting different expression in RNA-Seq experiments. Beyond RNA-Seq, deGPS has the potential to significantly enhance future data analysis efforts from many other high-throughput platforms such as ChIP-Seq, MBD-Seq and RIP-Seq.

No MeSH data available.


Related in: MedlinePlus