Limits...
Use of normalization methods for analysis of microarrays containing a high degree of gene effects.

Ni TT, Lemon WJ, Shyr Y, Zhong TP - BMC Bioinformatics (2008)

Bottom Line: We have demonstrated that the new method provides considerable improvement in the accuracy of data normalization when large proportions of gene effects are present.Adding this key component of the new method to alternative normalization approaches rescues the most of the sensitivity of these methods to gene effects.The results indicate that our method may be used without prior knowledge of or assumptions about housekeeping genes to normalize microarrays that are quite different.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA. terri.ni@vanderbilt.edu

ABSTRACT

Background: High-throughput microarrays are widely used to study gene expression across tissues and developmental stages. Analysis of gene expression data is challenging in these experiments due to the presence of significant percentages of differentially expressed genes (DEG) observed between tissues and developmental stages. Data normalization methods that are widely used today are not designed for data with a large proportion of tissue or gene effects.

Results: In our current study, we describe a novel two-dimensional nonparametric normalization method for analyzing microarray data which functions well in the absence or presence of large numbers of gene effects. Rather than relying on an assumption of low variability among most genes, the method implements a unique peak selection strategy to distinguish DEG from genes that are invariant in expression, prior to nonlinear curve fitting. We compared the method under simulated and experimental conditions with five alternative nonlinear normalization approaches: quantile, lowess, robust lowess, invariant set, and cross-correlation (Xcorr). Simulations included various percentages of simulated DEG and the experimental data used is from publicly available datasets known to be difficult to analyze due to the presence of approximately 34% DEG.

Conclusion: We have demonstrated that the new method provides considerable improvement in the accuracy of data normalization when large proportions of gene effects are present. The performance improvement is mostly attributed to its variable selection component, which is designed to separate expression invariant genes from DEG. Adding this key component of the new method to alternative normalization approaches rescues the most of the sensitivity of these methods to gene effects. The results indicate that our method may be used without prior knowledge of or assumptions about housekeeping genes to normalize microarrays that are quite different.

Show MeSH

Related in: MedlinePlus

Positive correlation between the percentages of skewed LER distribution and gene effects. (A) Percentage of genes in skewed local LER distributions along with gene effects for the Dnl data series. The x-axis is the same as in Figure 2. The error bar represents standard deviation computed from 3 replicates. (B) Actual local DEG percentage at each intensity interval. Data used is Dnl simulated with 50% gene effects with an equal proportion of up- and down- regulated genes globally. Black arrow points to the intensity interval used in C. Intensity refers to geometric mean intensity before normalization, which is known as A of a MA plot. (C) Probability density distribution of LER in the interval highlighted in B. LER: logarithmic expression ratio.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2612699&req=5

Figure 4: Positive correlation between the percentages of skewed LER distribution and gene effects. (A) Percentage of genes in skewed local LER distributions along with gene effects for the Dnl data series. The x-axis is the same as in Figure 2. The error bar represents standard deviation computed from 3 replicates. (B) Actual local DEG percentage at each intensity interval. Data used is Dnl simulated with 50% gene effects with an equal proportion of up- and down- regulated genes globally. Black arrow points to the intensity interval used in C. Intensity refers to geometric mean intensity before normalization, which is known as A of a MA plot. (C) Probability density distribution of LER in the interval highlighted in B. LER: logarithmic expression ratio.

Mentions: Normalization methods assuming symmetric gene effects are expected to be sensitive to skewed LER distributions, a condition that can occur when there is an over abundance of up- or down-regulated genes within a small range of intensities in a dataset. Using the simulation data, we measured the extent of the skewed LER distribution within each bin according to the AG method in the Dnl data series [21]. The studies reveal that the percentage of genes in skewed distributions increases concurrently with an increased percentage of gene effects (Fig. 4A). Studies of actual local DEG percentages reveal uneven distribution of up- and down-regulated genes locally, when these two populations are balanced globally (Fig. 4B, C). Thus, both globally- symmetric and asymmetric gene effects cause a local skewing of the LER distributions and these observations are consistent with our performance assessment results.


Use of normalization methods for analysis of microarrays containing a high degree of gene effects.

Ni TT, Lemon WJ, Shyr Y, Zhong TP - BMC Bioinformatics (2008)

Positive correlation between the percentages of skewed LER distribution and gene effects. (A) Percentage of genes in skewed local LER distributions along with gene effects for the Dnl data series. The x-axis is the same as in Figure 2. The error bar represents standard deviation computed from 3 replicates. (B) Actual local DEG percentage at each intensity interval. Data used is Dnl simulated with 50% gene effects with an equal proportion of up- and down- regulated genes globally. Black arrow points to the intensity interval used in C. Intensity refers to geometric mean intensity before normalization, which is known as A of a MA plot. (C) Probability density distribution of LER in the interval highlighted in B. LER: logarithmic expression ratio.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2612699&req=5

Figure 4: Positive correlation between the percentages of skewed LER distribution and gene effects. (A) Percentage of genes in skewed local LER distributions along with gene effects for the Dnl data series. The x-axis is the same as in Figure 2. The error bar represents standard deviation computed from 3 replicates. (B) Actual local DEG percentage at each intensity interval. Data used is Dnl simulated with 50% gene effects with an equal proportion of up- and down- regulated genes globally. Black arrow points to the intensity interval used in C. Intensity refers to geometric mean intensity before normalization, which is known as A of a MA plot. (C) Probability density distribution of LER in the interval highlighted in B. LER: logarithmic expression ratio.
Mentions: Normalization methods assuming symmetric gene effects are expected to be sensitive to skewed LER distributions, a condition that can occur when there is an over abundance of up- or down-regulated genes within a small range of intensities in a dataset. Using the simulation data, we measured the extent of the skewed LER distribution within each bin according to the AG method in the Dnl data series [21]. The studies reveal that the percentage of genes in skewed distributions increases concurrently with an increased percentage of gene effects (Fig. 4A). Studies of actual local DEG percentages reveal uneven distribution of up- and down-regulated genes locally, when these two populations are balanced globally (Fig. 4B, C). Thus, both globally- symmetric and asymmetric gene effects cause a local skewing of the LER distributions and these observations are consistent with our performance assessment results.

Bottom Line: We have demonstrated that the new method provides considerable improvement in the accuracy of data normalization when large proportions of gene effects are present.Adding this key component of the new method to alternative normalization approaches rescues the most of the sensitivity of these methods to gene effects.The results indicate that our method may be used without prior knowledge of or assumptions about housekeeping genes to normalize microarrays that are quite different.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232, USA. terri.ni@vanderbilt.edu

ABSTRACT

Background: High-throughput microarrays are widely used to study gene expression across tissues and developmental stages. Analysis of gene expression data is challenging in these experiments due to the presence of significant percentages of differentially expressed genes (DEG) observed between tissues and developmental stages. Data normalization methods that are widely used today are not designed for data with a large proportion of tissue or gene effects.

Results: In our current study, we describe a novel two-dimensional nonparametric normalization method for analyzing microarray data which functions well in the absence or presence of large numbers of gene effects. Rather than relying on an assumption of low variability among most genes, the method implements a unique peak selection strategy to distinguish DEG from genes that are invariant in expression, prior to nonlinear curve fitting. We compared the method under simulated and experimental conditions with five alternative nonlinear normalization approaches: quantile, lowess, robust lowess, invariant set, and cross-correlation (Xcorr). Simulations included various percentages of simulated DEG and the experimental data used is from publicly available datasets known to be difficult to analyze due to the presence of approximately 34% DEG.

Conclusion: We have demonstrated that the new method provides considerable improvement in the accuracy of data normalization when large proportions of gene effects are present. The performance improvement is mostly attributed to its variable selection component, which is designed to separate expression invariant genes from DEG. Adding this key component of the new method to alternative normalization approaches rescues the most of the sensitivity of these methods to gene effects. The results indicate that our method may be used without prior knowledge of or assumptions about housekeeping genes to normalize microarrays that are quite different.

Show MeSH
Related in: MedlinePlus