Limits...
Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

Barrera L, Benner C, Tao YC, Winzeler E, Zhou Y - BMC Bioinformatics (2004)

Bottom Line: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem).Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size.Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, California 92121, USA. lbarrera@bioinf.ucsd.edu

ABSTRACT

Background: To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.

Results: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2-3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6-9 replicates in detecting at least two-fold change.

Conclusions: Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

Show MeSH
log-log plots of FDR versus the maximum spike-in concentration at varying levels of fold change (FC). The dashed line in each plot is the log FDR value corresponding to q = 0.05. Plots for higher fold changes are available at our web site. (a) FC = 2. (b) FC = 4. (c) FC = 8. (d) FC = 16. Due to the precision of the Matlab routines used for this study, log FDR values below -16 were cut off at -16.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC411067&req=5

Figure 3: log-log plots of FDR versus the maximum spike-in concentration at varying levels of fold change (FC). The dashed line in each plot is the log FDR value corresponding to q = 0.05. Plots for higher fold changes are available at our web site. (a) FC = 2. (b) FC = 4. (c) FC = 8. (d) FC = 16. Due to the precision of the Matlab routines used for this study, log FDR values below -16 were cut off at -16.

Mentions: Figs. 1 and 2 highlighted the greater sensitivity of two-way methods using the Lemon data set in which the expression of a large number of genes were expected to change between the starved and stimulated conditions. However, the identities of these true positives and the magnitudes of relative and absolute change are unknown. Using the set of 11 experiments with 3 replicates each from the Affymetrix Latin Square Data Set; we examined the effects of known concentration and fold-change on the sensitivity of the tests coupled with the LSU FDR-controlling procedure. We did 55 pairwise comparisons of the 11 experiments giving a wide range of fold change and maximum spike-in concentration combinations (Fig. 3). As expected, increasing fold change combined with increasing maximum spike-in concentration allows for better detection using all methods.


Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

Barrera L, Benner C, Tao YC, Winzeler E, Zhou Y - BMC Bioinformatics (2004)

log-log plots of FDR versus the maximum spike-in concentration at varying levels of fold change (FC). The dashed line in each plot is the log FDR value corresponding to q = 0.05. Plots for higher fold changes are available at our web site. (a) FC = 2. (b) FC = 4. (c) FC = 8. (d) FC = 16. Due to the precision of the Matlab routines used for this study, log FDR values below -16 were cut off at -16.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC411067&req=5

Figure 3: log-log plots of FDR versus the maximum spike-in concentration at varying levels of fold change (FC). The dashed line in each plot is the log FDR value corresponding to q = 0.05. Plots for higher fold changes are available at our web site. (a) FC = 2. (b) FC = 4. (c) FC = 8. (d) FC = 16. Due to the precision of the Matlab routines used for this study, log FDR values below -16 were cut off at -16.
Mentions: Figs. 1 and 2 highlighted the greater sensitivity of two-way methods using the Lemon data set in which the expression of a large number of genes were expected to change between the starved and stimulated conditions. However, the identities of these true positives and the magnitudes of relative and absolute change are unknown. Using the set of 11 experiments with 3 replicates each from the Affymetrix Latin Square Data Set; we examined the effects of known concentration and fold-change on the sensitivity of the tests coupled with the LSU FDR-controlling procedure. We did 55 pairwise comparisons of the 11 experiments giving a wide range of fold change and maximum spike-in concentration combinations (Fig. 3). As expected, increasing fold change combined with increasing maximum spike-in concentration allows for better detection using all methods.

Bottom Line: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem).Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size.Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, California 92121, USA. lbarrera@bioinf.ucsd.edu

ABSTRACT

Background: To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.

Results: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2-3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6-9 replicates in detecting at least two-fold change.

Conclusions: Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

Show MeSH