Limits...
Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

Barrera L, Benner C, Tao YC, Winzeler E, Zhou Y - BMC Bioinformatics (2004)

Bottom Line: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem).Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size.Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, California 92121, USA. lbarrera@bioinf.ucsd.edu

ABSTRACT

Background: To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.

Results: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2-3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6-9 replicates in detecting at least two-fold change.

Conclusions: Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

Show MeSH
ROC curve comparing the power of each method when sample size, n = 3.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC411067&req=5

Figure 5: ROC curve comparing the power of each method when sample size, n = 3.

Mentions: The clearly higher sensitivity of the two-way methods in discriminating a wider range of fold change at various transcript concentrations with as little as three replicates (Fig. 3) prompted the simultaneous evaluation of sensitivity and specificity using receiver-operator characteristic (ROC) curves. We compared the ability of the various methods to discern a known two-fold change over the range of concentrations in experiments L and M of the Latin Square Data set using only three replicates. We obtained results for sample size n = 3 by computing the adjusted FDR from the average p-value for each probe set over 100 comparisons of random pairs of samples of size n taken from each condition. For these data, Fig. 5 shows that the two-way ANOVA methods combined with the LSU FDR-controlling procedure clearly outperform the one-way statistical tests and does not trade off sensitivity for specificity. For the parametric two-way ANOVA, the ROC curve indicates a 91% sensitivity with a 99.84% specificity. In other words, we expect to find 11/12 spiked genes with only 14 false positives in this data set. The Mack-Skillings test follows with 75% sensitivity at the same specificity range, whereas the t-test and the Wilcoxon test clearly lack power under those conditions. These results suggest that with the same number of replicates, the improved sensitivity of two-way methods (Figs. 1, 2, 3) is due to the accurate detection of lower fold changes at a wider range of concentrations.


Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays.

Barrera L, Benner C, Tao YC, Winzeler E, Zhou Y - BMC Bioinformatics (2004)

ROC curve comparing the power of each method when sample size, n = 3.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC411067&req=5

Figure 5: ROC curve comparing the power of each method when sample size, n = 3.
Mentions: The clearly higher sensitivity of the two-way methods in discriminating a wider range of fold change at various transcript concentrations with as little as three replicates (Fig. 3) prompted the simultaneous evaluation of sensitivity and specificity using receiver-operator characteristic (ROC) curves. We compared the ability of the various methods to discern a known two-fold change over the range of concentrations in experiments L and M of the Latin Square Data set using only three replicates. We obtained results for sample size n = 3 by computing the adjusted FDR from the average p-value for each probe set over 100 comparisons of random pairs of samples of size n taken from each condition. For these data, Fig. 5 shows that the two-way ANOVA methods combined with the LSU FDR-controlling procedure clearly outperform the one-way statistical tests and does not trade off sensitivity for specificity. For the parametric two-way ANOVA, the ROC curve indicates a 91% sensitivity with a 99.84% specificity. In other words, we expect to find 11/12 spiked genes with only 14 false positives in this data set. The Mack-Skillings test follows with 75% sensitivity at the same specificity range, whereas the t-test and the Wilcoxon test clearly lack power under those conditions. These results suggest that with the same number of replicates, the improved sensitivity of two-way methods (Figs. 1, 2, 3) is due to the accurate detection of lower fold changes at a wider range of concentrations.

Bottom Line: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem).Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size.Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, California 92121, USA. lbarrera@bioinf.ucsd.edu

ABSTRACT

Background: To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.

Results: We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2-3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6-9 replicates in detecting at least two-fold change.

Conclusions: Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.

Show MeSH