Limits...
A first principles approach to differential expression in microarray data analysis.

Rubin RA - BMC Bioinformatics (2009)

Bottom Line: Here we take the approach of making the fewest assumptions about the structure of the microarray data.We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets.The resulting receiver operating characteristic (ROC) curves compared favorably with other published results.

View Article: PubMed Central - HTML - PubMed

Affiliation: Mathematics Department, Whittier College, 13406 E. Philadelphia St., Whittier, CA 90608, USA. brubin698@earthlink.net

ABSTRACT

Background: The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression.

Results: We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed.

Conclusion: The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.

Show MeSH

Related in: MedlinePlus

Plots of Unanticipated Positives from HG-U95A and Golden Spike Experiments. Unanticipated positives occur in all three spike-in experiments. Here are examples from each of the HG-U95A (on the left) and the Golden Spike (on the right) experiments. The non-spiked-in HG-U95A gene 32660_at ranks 2nd out of 14010 genes (14 of which were spiked-in) in the experimental conditions C (replicates in red) vs. D (cyan) comparison for RMA, PLM and both of the median ANOVA (1-p) measures. For the Golden Spike plot, the non-spiked-in gene 142245_at ranked 23, 24, 177 and 66 according to median ANOVA (1-p), median signed ANOVA (1-p), RMA and PLM respectively, out of a total of 14010 probe sets, of which 1331 were designated as spiked-in. Plots for control arrays are in red, those for spiked-in arrays in cyan. In both cases the plots and p-values do not appear to be compatible with the hypothesis that they came from probe sets with identical probe-level distributions across conditions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2749840&req=5

Figure 15: Plots of Unanticipated Positives from HG-U95A and Golden Spike Experiments. Unanticipated positives occur in all three spike-in experiments. Here are examples from each of the HG-U95A (on the left) and the Golden Spike (on the right) experiments. The non-spiked-in HG-U95A gene 32660_at ranks 2nd out of 14010 genes (14 of which were spiked-in) in the experimental conditions C (replicates in red) vs. D (cyan) comparison for RMA, PLM and both of the median ANOVA (1-p) measures. For the Golden Spike plot, the non-spiked-in gene 142245_at ranked 23, 24, 177 and 66 according to median ANOVA (1-p), median signed ANOVA (1-p), RMA and PLM respectively, out of a total of 14010 probe sets, of which 1331 were designated as spiked-in. Plots for control arrays are in red, those for spiked-in arrays in cyan. In both cases the plots and p-values do not appear to be compatible with the hypothesis that they came from probe sets with identical probe-level distributions across conditions.

Mentions: Unanticipated positives are found in all three of the spike-in data sets. Figure 15 contains examples from the HG-U95A and Golden Spike experiments.


A first principles approach to differential expression in microarray data analysis.

Rubin RA - BMC Bioinformatics (2009)

Plots of Unanticipated Positives from HG-U95A and Golden Spike Experiments. Unanticipated positives occur in all three spike-in experiments. Here are examples from each of the HG-U95A (on the left) and the Golden Spike (on the right) experiments. The non-spiked-in HG-U95A gene 32660_at ranks 2nd out of 14010 genes (14 of which were spiked-in) in the experimental conditions C (replicates in red) vs. D (cyan) comparison for RMA, PLM and both of the median ANOVA (1-p) measures. For the Golden Spike plot, the non-spiked-in gene 142245_at ranked 23, 24, 177 and 66 according to median ANOVA (1-p), median signed ANOVA (1-p), RMA and PLM respectively, out of a total of 14010 probe sets, of which 1331 were designated as spiked-in. Plots for control arrays are in red, those for spiked-in arrays in cyan. In both cases the plots and p-values do not appear to be compatible with the hypothesis that they came from probe sets with identical probe-level distributions across conditions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2749840&req=5

Figure 15: Plots of Unanticipated Positives from HG-U95A and Golden Spike Experiments. Unanticipated positives occur in all three spike-in experiments. Here are examples from each of the HG-U95A (on the left) and the Golden Spike (on the right) experiments. The non-spiked-in HG-U95A gene 32660_at ranks 2nd out of 14010 genes (14 of which were spiked-in) in the experimental conditions C (replicates in red) vs. D (cyan) comparison for RMA, PLM and both of the median ANOVA (1-p) measures. For the Golden Spike plot, the non-spiked-in gene 142245_at ranked 23, 24, 177 and 66 according to median ANOVA (1-p), median signed ANOVA (1-p), RMA and PLM respectively, out of a total of 14010 probe sets, of which 1331 were designated as spiked-in. Plots for control arrays are in red, those for spiked-in arrays in cyan. In both cases the plots and p-values do not appear to be compatible with the hypothesis that they came from probe sets with identical probe-level distributions across conditions.
Mentions: Unanticipated positives are found in all three of the spike-in data sets. Figure 15 contains examples from the HG-U95A and Golden Spike experiments.

Bottom Line: Here we take the approach of making the fewest assumptions about the structure of the microarray data.We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets.The resulting receiver operating characteristic (ROC) curves compared favorably with other published results.

View Article: PubMed Central - HTML - PubMed

Affiliation: Mathematics Department, Whittier College, 13406 E. Philadelphia St., Whittier, CA 90608, USA. brubin698@earthlink.net

ABSTRACT

Background: The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression.

Results: We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed.

Conclusion: The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.

Show MeSH
Related in: MedlinePlus