Limits...
Supervised normalization of microarrays.

Mecham BH, Nelson PS, Storey JD - Bioinformatics (2010)

Bottom Line: It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization.However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

ABSTRACT

Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is 'normalization', the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.

Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org).

Contact: jstorey@princeton.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH

Related in: MedlinePlus

Results from simulated data with differential expression and array effects. The true proportion of  probes is π0 = 0.70. (A) P-value histogram of  probes after SNM normalization. (B) P-value histogram of all probes after SNM normalization. (C) P-value histogram of  probes after QN. (D) P-value histogram of all probes after QN. (E) P-value histogram of  probes after ISN. (F) P-value histogram of all probes after ISN.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2865860&req=5

Figure 2: Results from simulated data with differential expression and array effects. The true proportion of probes is π0 = 0.70. (A) P-value histogram of probes after SNM normalization. (B) P-value histogram of all probes after SNM normalization. (C) P-value histogram of probes after QN. (D) P-value histogram of all probes after QN. (E) P-value histogram of probes after ISN. (F) P-value histogram of all probes after ISN.

Mentions: While unsupervised methods may show favorable operating characteristics in specialized settings—such as when biological variables contribute relatively negligible signal to the data—it has been shown they make assumptions about data that are commonly invalidated in practice (Dabney and Storey, 2007; Irizarry et al., 2006). As a simple motivating example meant to illustrate how easily these assumptions are violated, we simulated microarray data (extensive details are given in following sections) with signal due to a dichotomous biological variable and intensity-dependent array effects. We simulated 100 000 probes, 30% of which are differentially expressed. Figure 2 shows the P-value histograms corresponding to probes, which are not differentially expressed. Figure 2A is the method we propose in this work, where the P-values are correctly Uniform(0,1). Figure 2C and E show the P-values from the same probes when using invariant set normalization (ISN; Li and Wong, 2001) and quantile normalization (QN; Bolstad et al., 2003), respectively. It can be seen that both sets of P-values are anti-conservatively biased.Fig. 2.


Supervised normalization of microarrays.

Mecham BH, Nelson PS, Storey JD - Bioinformatics (2010)

Results from simulated data with differential expression and array effects. The true proportion of  probes is π0 = 0.70. (A) P-value histogram of  probes after SNM normalization. (B) P-value histogram of all probes after SNM normalization. (C) P-value histogram of  probes after QN. (D) P-value histogram of all probes after QN. (E) P-value histogram of  probes after ISN. (F) P-value histogram of all probes after ISN.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2865860&req=5

Figure 2: Results from simulated data with differential expression and array effects. The true proportion of probes is π0 = 0.70. (A) P-value histogram of probes after SNM normalization. (B) P-value histogram of all probes after SNM normalization. (C) P-value histogram of probes after QN. (D) P-value histogram of all probes after QN. (E) P-value histogram of probes after ISN. (F) P-value histogram of all probes after ISN.
Mentions: While unsupervised methods may show favorable operating characteristics in specialized settings—such as when biological variables contribute relatively negligible signal to the data—it has been shown they make assumptions about data that are commonly invalidated in practice (Dabney and Storey, 2007; Irizarry et al., 2006). As a simple motivating example meant to illustrate how easily these assumptions are violated, we simulated microarray data (extensive details are given in following sections) with signal due to a dichotomous biological variable and intensity-dependent array effects. We simulated 100 000 probes, 30% of which are differentially expressed. Figure 2 shows the P-value histograms corresponding to probes, which are not differentially expressed. Figure 2A is the method we propose in this work, where the P-values are correctly Uniform(0,1). Figure 2C and E show the P-values from the same probes when using invariant set normalization (ISN; Li and Wong, 2001) and quantile normalization (QN; Bolstad et al., 2003), respectively. It can be seen that both sets of P-values are anti-conservatively biased.Fig. 2.

Bottom Line: It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization.However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

ABSTRACT

Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is 'normalization', the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.

Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org).

Contact: jstorey@princeton.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Related in: MedlinePlus