Limits...
Using generalized procrustes analysis (GPA) for normalization of cDNA microarray data.

Xiong H, Zhang D, Martyniuk CJ, Trudeau VL, Xia X - BMC Bioinformatics (2008)

Bottom Line: Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases.However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada. hxion102@uottawa.ca

ABSTRACT

Background: Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.

Results: In this study, we propose a novel assumption-free normalization method based on the Generalized Procrustes Analysis (GPA) algorithm. Using experimental and simulated normal microarray data and boutique array data, we systemically evaluate the ability of the GPA method in normalization compared with six other popular normalization methods including Global, Lowess, Scale, Quantile, VSN, and one boutique array-specific housekeeping gene method. The assessment of these methods is based on three different empirical criteria: across-slide variability, the Kolmogorov-Smirnov (K-S) statistic and the mean square error (MSE). Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias.

Conclusion: The GPA method is an effective normalization approach for microarray data analysis. In particular, it is free from the statistical and biological assumptions inherent in other normalization methods that are often difficult to validate. Therefore, the GPA method has a major advantage in that it can be applied to diverse types of array sets, especially to the boutique array where the majority of genes may be differentially expressed.

Show MeSH
Mean of replicate variability for the (a) swirl zebrafish data set and (b) HCT116 data set. Larger value indicates a higher variability across slides. The reference line indicates the variability value for the GPA method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2275243&req=5

Figure 2: Mean of replicate variability for the (a) swirl zebrafish data set and (b) HCT116 data set. Larger value indicates a higher variability across slides. The reference line indicates the variability value for the GPA method.

Mentions: Figure 2 shows bar plots of the variance estimates for the (a) swirl zebrafish and (b) HCT116 cancer data sets. Each bar represents the mean value of replicate variability for all genes. For both data sets, all normalization methods decrease variability of the raw data. However, the GPA method alone yields lower variability than the Lowess, Quantile, Global, and Scale methods do. A Wilcoxon test indicates that the differences are significant (p < 0.01). Here the VSN method performs better than the GPA method, which is expected because VSN method specifically aims to stabilize the variance across the replicated arrays.


Using generalized procrustes analysis (GPA) for normalization of cDNA microarray data.

Xiong H, Zhang D, Martyniuk CJ, Trudeau VL, Xia X - BMC Bioinformatics (2008)

Mean of replicate variability for the (a) swirl zebrafish data set and (b) HCT116 data set. Larger value indicates a higher variability across slides. The reference line indicates the variability value for the GPA method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2275243&req=5

Figure 2: Mean of replicate variability for the (a) swirl zebrafish data set and (b) HCT116 data set. Larger value indicates a higher variability across slides. The reference line indicates the variability value for the GPA method.
Mentions: Figure 2 shows bar plots of the variance estimates for the (a) swirl zebrafish and (b) HCT116 cancer data sets. Each bar represents the mean value of replicate variability for all genes. For both data sets, all normalization methods decrease variability of the raw data. However, the GPA method alone yields lower variability than the Lowess, Quantile, Global, and Scale methods do. A Wilcoxon test indicates that the differences are significant (p < 0.01). Here the VSN method performs better than the GPA method, which is expected because VSN method specifically aims to stabilize the variance across the replicated arrays.

Bottom Line: Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases.However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, Ontario, K1N 6N5, Canada. hxion102@uottawa.ca

ABSTRACT

Background: Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.

Results: In this study, we propose a novel assumption-free normalization method based on the Generalized Procrustes Analysis (GPA) algorithm. Using experimental and simulated normal microarray data and boutique array data, we systemically evaluate the ability of the GPA method in normalization compared with six other popular normalization methods including Global, Lowess, Scale, Quantile, VSN, and one boutique array-specific housekeeping gene method. The assessment of these methods is based on three different empirical criteria: across-slide variability, the Kolmogorov-Smirnov (K-S) statistic and the mean square error (MSE). Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias.

Conclusion: The GPA method is an effective normalization approach for microarray data analysis. In particular, it is free from the statistical and biological assumptions inherent in other normalization methods that are often difficult to validate. Therefore, the GPA method has a major advantage in that it can be applied to diverse types of array sets, especially to the boutique array where the majority of genes may be differentially expressed.

Show MeSH