Limits...
Supervised normalization of microarrays.

Mecham BH, Nelson PS, Storey JD - Bioinformatics (2010)

Bottom Line: It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization.However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

ABSTRACT

Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is 'normalization', the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.

Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org).

Contact: jstorey@princeton.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH

Related in: MedlinePlus

Results from Vascular Development Study obtained from QN and SNM. The relationship between samples after normalization are presented as a clustering dendogram. The labels for each node denote the corresponding age of the sample hybridized to that array, and the colored boxes indicate the batch. Note that the SNM results correctly position biological replicate samples on adjacent nodes (A), and predicts a robust effect of age on gene expression [ = 0.51 (C)]. Conversely, the first bifurcation in the QN data separates the data by the batch (B) and these data suggest there is no effect of age on gene expression [ = 1 (D)].
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2865860&req=5

Figure 4: Results from Vascular Development Study obtained from QN and SNM. The relationship between samples after normalization are presented as a clustering dendogram. The labels for each node denote the corresponding age of the sample hybridized to that array, and the colored boxes indicate the batch. Note that the SNM results correctly position biological replicate samples on adjacent nodes (A), and predicts a robust effect of age on gene expression [ = 0.51 (C)]. Conversely, the first bifurcation in the QN data separates the data by the batch (B) and these data suggest there is no effect of age on gene expression [ = 1 (D)].

Mentions: Next, we applied SNM to the study. In relation to model (3), Y is a 450 000 × 8 vector of observed intensities, X parameterizes the different ages and Z represents the parameterized probe-specific intercepts and batch effects. The results when applying SNM are shown in Figure 4. First, note that the histogram of P-values (Fig. 4C) suggests that age has a pronounced effect on differential expression ( = 0.53). Many genes with known roles in vascular biology exhibited robust changes in expression across this time series, suggesting that the experiment-captured biological signal. For example, previous work identified a cluster of seven genes whose expression is activated soon after birth (List C Elastic Fiber Genes from McLean et al. 2005). Moreover, the relationship across samples, as described by a clustering dendrogram, correctly places the replicate arrays for each age on adjacent nodes (Fig. 4A).Fig. 4.


Supervised normalization of microarrays.

Mecham BH, Nelson PS, Storey JD - Bioinformatics (2010)

Results from Vascular Development Study obtained from QN and SNM. The relationship between samples after normalization are presented as a clustering dendogram. The labels for each node denote the corresponding age of the sample hybridized to that array, and the colored boxes indicate the batch. Note that the SNM results correctly position biological replicate samples on adjacent nodes (A), and predicts a robust effect of age on gene expression [ = 0.51 (C)]. Conversely, the first bifurcation in the QN data separates the data by the batch (B) and these data suggest there is no effect of age on gene expression [ = 1 (D)].
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2865860&req=5

Figure 4: Results from Vascular Development Study obtained from QN and SNM. The relationship between samples after normalization are presented as a clustering dendogram. The labels for each node denote the corresponding age of the sample hybridized to that array, and the colored boxes indicate the batch. Note that the SNM results correctly position biological replicate samples on adjacent nodes (A), and predicts a robust effect of age on gene expression [ = 0.51 (C)]. Conversely, the first bifurcation in the QN data separates the data by the batch (B) and these data suggest there is no effect of age on gene expression [ = 1 (D)].
Mentions: Next, we applied SNM to the study. In relation to model (3), Y is a 450 000 × 8 vector of observed intensities, X parameterizes the different ages and Z represents the parameterized probe-specific intercepts and batch effects. The results when applying SNM are shown in Figure 4. First, note that the histogram of P-values (Fig. 4C) suggests that age has a pronounced effect on differential expression ( = 0.53). Many genes with known roles in vascular biology exhibited robust changes in expression across this time series, suggesting that the experiment-captured biological signal. For example, previous work identified a cluster of seven genes whose expression is activated soon after birth (List C Elastic Fiber Genes from McLean et al. 2005). Moreover, the relationship across samples, as described by a clustering dendrogram, correctly places the replicate arrays for each age on adjacent nodes (Fig. 4A).Fig. 4.

Bottom Line: It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization.However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

ABSTRACT

Motivation: A major challenge in utilizing microarray technologies to measure nucleic acid abundances is 'normalization', the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.

Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org).

Contact: jstorey@princeton.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Show MeSH
Related in: MedlinePlus