Limits...
Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls.

Hill AA, Brown EL, Whitley MZ, Tucker-Kellogg G, Hunter CP, Slonim DK - Genome Biol. (2001)

Bottom Line: We introduce 'frequency normalization', a 'spike-in'-based normalization method which estimates array sensitivity, reduces background noise and allows comparison between array designs.This approach does not rely on the constant mean assumption and so can be effective in conditions where standard procedures fail.We also use simulated data to estimate accuracy and investigate the effects of noise.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genomics, Genetics Institute/Wyeth-Ayerst Research, Cambridge, MA 02140, USA. ahill@genetics.com

ABSTRACT

Background: Affymetrix oligonucleotide arrays simultaneously measure the abundances of thousands of mRNAs in biological samples. Comparability of array results is necessary for the creation of large-scale gene expression databases. The standard strategy for normalizing oligonucleotide array readouts has practical drawbacks. We describe alternative normalization procedures for oligonucleotide arrays based on a common pool of known biotin-labeled cRNAs spiked into each hybridization.

Results: We first explore the conditions for validity of the 'constant mean assumption', the key assumption underlying current normalization methods. We introduce 'frequency normalization', a 'spike-in'-based normalization method which estimates array sensitivity, reduces background noise and allows comparison between array designs. This approach does not rely on the constant mean assumption and so can be effective in conditions where standard procedures fail. We also define 'scaled frequency', a hybrid normalization method relying on both spiked transcripts and the constant mean assumption while maintaining all other advantages of frequency normalization. We compare these two procedures to a standard global normalization method using experimental data. We also use simulated data to estimate accuracy and investigate the effects of noise. We find that scaled frequency is as reproducible and accurate as global normalization while offering several practical advantages.

Conclusions: Scaled frequency quantitation is a convenient, reproducible technique that performs as well as global normalization on serial experiments with the same array design, while offering several additional features. Specifically, the scaled-frequency method enables the comparison of expression measurements across different array designs, yields estimates of absolute message abundance in cRNA and determines the sensitivity of individual arrays.

Show MeSH
Reproducibility of normalization methods for different degrees of spike-skew in simulated data. The SD of the random multiplicative spike-skew term in the simulations was adjusted from 0.1 to 0.4 (10-40%). Increasing spike-skew specifically degrades the performance of the F metric. Note that the relatively poor performance of F relative to Fs and ADs when the spike-skew is 0.2 (20%) is similar to that observed in the experimental data (Figure 3). Twenty simulated hybridizations were generated for each level of spike-skew.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC64840&req=5

Figure 5: Reproducibility of normalization methods for different degrees of spike-skew in simulated data. The SD of the random multiplicative spike-skew term in the simulations was adjusted from 0.1 to 0.4 (10-40%). Increasing spike-skew specifically degrades the performance of the F metric. Note that the relatively poor performance of F relative to Fs and ADs when the spike-skew is 0.2 (20%) is similar to that observed in the experimental data (Figure 3). Twenty simulated hybridizations were generated for each level of spike-skew.

Mentions: As expected, only frequency was sensitive to spike-skew (Figure 5). The Fs metric, which uses a single standard curve pooled from each dataset to normalize all arrays in that dataset, effectively eliminated spike-skew effects. In the simulations, a spike-skew level of 20% led to MEDACV values for frequency in simulated replicates that were much higher than those of ADs or Fs. These results were highly reminiscent of the 36, 48 and 60 hour experimental replicate sets (compare Figures 5 and 3).


Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls.

Hill AA, Brown EL, Whitley MZ, Tucker-Kellogg G, Hunter CP, Slonim DK - Genome Biol. (2001)

Reproducibility of normalization methods for different degrees of spike-skew in simulated data. The SD of the random multiplicative spike-skew term in the simulations was adjusted from 0.1 to 0.4 (10-40%). Increasing spike-skew specifically degrades the performance of the F metric. Note that the relatively poor performance of F relative to Fs and ADs when the spike-skew is 0.2 (20%) is similar to that observed in the experimental data (Figure 3). Twenty simulated hybridizations were generated for each level of spike-skew.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC64840&req=5

Figure 5: Reproducibility of normalization methods for different degrees of spike-skew in simulated data. The SD of the random multiplicative spike-skew term in the simulations was adjusted from 0.1 to 0.4 (10-40%). Increasing spike-skew specifically degrades the performance of the F metric. Note that the relatively poor performance of F relative to Fs and ADs when the spike-skew is 0.2 (20%) is similar to that observed in the experimental data (Figure 3). Twenty simulated hybridizations were generated for each level of spike-skew.
Mentions: As expected, only frequency was sensitive to spike-skew (Figure 5). The Fs metric, which uses a single standard curve pooled from each dataset to normalize all arrays in that dataset, effectively eliminated spike-skew effects. In the simulations, a spike-skew level of 20% led to MEDACV values for frequency in simulated replicates that were much higher than those of ADs or Fs. These results were highly reminiscent of the 36, 48 and 60 hour experimental replicate sets (compare Figures 5 and 3).

Bottom Line: We introduce 'frequency normalization', a 'spike-in'-based normalization method which estimates array sensitivity, reduces background noise and allows comparison between array designs.This approach does not rely on the constant mean assumption and so can be effective in conditions where standard procedures fail.We also use simulated data to estimate accuracy and investigate the effects of noise.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genomics, Genetics Institute/Wyeth-Ayerst Research, Cambridge, MA 02140, USA. ahill@genetics.com

ABSTRACT

Background: Affymetrix oligonucleotide arrays simultaneously measure the abundances of thousands of mRNAs in biological samples. Comparability of array results is necessary for the creation of large-scale gene expression databases. The standard strategy for normalizing oligonucleotide array readouts has practical drawbacks. We describe alternative normalization procedures for oligonucleotide arrays based on a common pool of known biotin-labeled cRNAs spiked into each hybridization.

Results: We first explore the conditions for validity of the 'constant mean assumption', the key assumption underlying current normalization methods. We introduce 'frequency normalization', a 'spike-in'-based normalization method which estimates array sensitivity, reduces background noise and allows comparison between array designs. This approach does not rely on the constant mean assumption and so can be effective in conditions where standard procedures fail. We also define 'scaled frequency', a hybrid normalization method relying on both spiked transcripts and the constant mean assumption while maintaining all other advantages of frequency normalization. We compare these two procedures to a standard global normalization method using experimental data. We also use simulated data to estimate accuracy and investigate the effects of noise. We find that scaled frequency is as reproducible and accurate as global normalization while offering several practical advantages.

Conclusions: Scaled frequency quantitation is a convenient, reproducible technique that performs as well as global normalization on serial experiments with the same array design, while offering several additional features. Specifically, the scaled-frequency method enables the comparison of expression measurements across different array designs, yields estimates of absolute message abundance in cRNA and determines the sensitivity of individual arrays.

Show MeSH