Limits...
Detecting Differentially Expressed Genes with RNA-seq Data Using Backward Selection to Account for the Effects of Relevant Covariates.

Nguyen Y, Nettleton D, Liu H, Tuggle CK - J Agric Biol Environ Stat (2015)

Bottom Line: Ignoring relevant covariates or modeling the effects of irrelevant covariates can be detrimental to identifying differentially expressed genes.We propose a backward selection strategy for selecting a set of covariates whose effects are accounted for when searching for differentially expressed genes.We use simulation to show the advantages of our backward selection procedure over alternative strategies that either ignore or adjust for all measured covariates.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Iowa State University, Ames, IA 50010 USA ; Institute of Mathematics, VAST, Hanoi, Vietnam.

ABSTRACT

A common challenge in analysis of transcriptomic data is to identify differentially expressed genes, i.e., genes whose mean transcript abundance levels differ across the levels of a factor of scientific interest. Transcript abundance levels can be measured simultaneously for thousands of genes in multiple biological samples using RNA sequencing (RNA-seq) technology. Part of the variation in RNA-seq measures of transcript abundance may be associated with variation in continuous and/or categorical covariates measured for each experimental unit or RNA sample. Ignoring relevant covariates or modeling the effects of irrelevant covariates can be detrimental to identifying differentially expressed genes. We propose a backward selection strategy for selecting a set of covariates whose effects are accounted for when searching for differentially expressed genes. We illustrate our approach through the analysis of an RNA-seq study intended to identify genes differentially expressed between two lines of pigs divergently selected for residual feed intake. We use simulation to show the advantages of our backward selection procedure over alternative strategies that either ignore or adjust for all measured covariates.

No MeSH data available.


Related in: MedlinePlus

Histograms of p-values at each iteration of the backward selection procedure applied to the RFI RNA-seq dataset using the number of p-values less than 0.05 (p.05) as the measure of covariate relevance. Rather than using a common upper limit for each histogram’s vertical axis, the upper limit varies across histograms to accommodate the height of the tallest bar in each histogram. Using variable upper limits makes it easier to see differences between the histogram shapes of relevant and irrelevant covariates.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4666287&req=5

Fig1: Histograms of p-values at each iteration of the backward selection procedure applied to the RFI RNA-seq dataset using the number of p-values less than 0.05 (p.05) as the measure of covariate relevance. Rather than using a common upper limit for each histogram’s vertical axis, the upper limit varies across histograms to accommodate the height of the tallest bar in each histogram. Using variable upper limits makes it easier to see differences between the histogram shapes of relevant and irrelevant covariates.

Mentions: In practice, the tests used to assess significance are only approximate, each observed p-value is only a single draw from its marginal distribution, and dependence among genes leads to dependence among p-values. For all of these reasons, empirical distributions composed of one p-value from each gene can have shapes that are neither uniform nor stochastically smaller than uniform. Nonetheless, measuring the extent to which an empirical distribution of the elements of departs from uniform toward a distribution with a decreasing density on (0, 1) can provide a useful measure of relevance for variable j. As an example, the histograms in the first row of Fig. 1 show the empirical distribution of the elements of for each . Based on visual inspection, covariates like RINb, Conca, Order, Diet, and Eosi appear irrelevant in the full model, while covariates like Concb, Neut, Mono, and Block appear relevant.Fig. 1


Detecting Differentially Expressed Genes with RNA-seq Data Using Backward Selection to Account for the Effects of Relevant Covariates.

Nguyen Y, Nettleton D, Liu H, Tuggle CK - J Agric Biol Environ Stat (2015)

Histograms of p-values at each iteration of the backward selection procedure applied to the RFI RNA-seq dataset using the number of p-values less than 0.05 (p.05) as the measure of covariate relevance. Rather than using a common upper limit for each histogram’s vertical axis, the upper limit varies across histograms to accommodate the height of the tallest bar in each histogram. Using variable upper limits makes it easier to see differences between the histogram shapes of relevant and irrelevant covariates.
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4666287&req=5

Fig1: Histograms of p-values at each iteration of the backward selection procedure applied to the RFI RNA-seq dataset using the number of p-values less than 0.05 (p.05) as the measure of covariate relevance. Rather than using a common upper limit for each histogram’s vertical axis, the upper limit varies across histograms to accommodate the height of the tallest bar in each histogram. Using variable upper limits makes it easier to see differences between the histogram shapes of relevant and irrelevant covariates.
Mentions: In practice, the tests used to assess significance are only approximate, each observed p-value is only a single draw from its marginal distribution, and dependence among genes leads to dependence among p-values. For all of these reasons, empirical distributions composed of one p-value from each gene can have shapes that are neither uniform nor stochastically smaller than uniform. Nonetheless, measuring the extent to which an empirical distribution of the elements of departs from uniform toward a distribution with a decreasing density on (0, 1) can provide a useful measure of relevance for variable j. As an example, the histograms in the first row of Fig. 1 show the empirical distribution of the elements of for each . Based on visual inspection, covariates like RINb, Conca, Order, Diet, and Eosi appear irrelevant in the full model, while covariates like Concb, Neut, Mono, and Block appear relevant.Fig. 1

Bottom Line: Ignoring relevant covariates or modeling the effects of irrelevant covariates can be detrimental to identifying differentially expressed genes.We propose a backward selection strategy for selecting a set of covariates whose effects are accounted for when searching for differentially expressed genes.We use simulation to show the advantages of our backward selection procedure over alternative strategies that either ignore or adjust for all measured covariates.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Iowa State University, Ames, IA 50010 USA ; Institute of Mathematics, VAST, Hanoi, Vietnam.

ABSTRACT

A common challenge in analysis of transcriptomic data is to identify differentially expressed genes, i.e., genes whose mean transcript abundance levels differ across the levels of a factor of scientific interest. Transcript abundance levels can be measured simultaneously for thousands of genes in multiple biological samples using RNA sequencing (RNA-seq) technology. Part of the variation in RNA-seq measures of transcript abundance may be associated with variation in continuous and/or categorical covariates measured for each experimental unit or RNA sample. Ignoring relevant covariates or modeling the effects of irrelevant covariates can be detrimental to identifying differentially expressed genes. We propose a backward selection strategy for selecting a set of covariates whose effects are accounted for when searching for differentially expressed genes. We illustrate our approach through the analysis of an RNA-seq study intended to identify genes differentially expressed between two lines of pigs divergently selected for residual feed intake. We use simulation to show the advantages of our backward selection procedure over alternative strategies that either ignore or adjust for all measured covariates.

No MeSH data available.


Related in: MedlinePlus