Limits...
Adaptable gene-specific dye bias correction for two-channel DNA microarrays.

Margaritis T, Lijnzaad P, van Leenen D, Bouwmeester D, Kemmeren P, van Hooff SR, Holstege FC - Mol. Syst. Biol. (2009)

Bottom Line: A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations.GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip.Software implementing the method is publicly available.

View Article: PubMed Central - PubMed

Affiliation: Department of Physiological Chemistry, University Medical Center Utrecht, Universiteitsweg, Utrecht, The Netherlands.

ABSTRACT
DNA microarray technology is a powerful tool for monitoring gene expression or for finding the location of DNA-bound proteins. DNA microarrays can suffer from gene-specific dye bias (GSDB), causing some probes to be affected more by the dye than by the sample. This results in large measurement errors, which vary considerably for different probes and also across different hybridizations. GSDB is not corrected by conventional normalization and has been difficult to address systematically because of its variance. We show that GSDB is influenced by label incorporation efficiency, explaining the variation of GSDB across different hybridizations. A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. A sequence-based model is also presented, which predicts which probes will suffer most from GSDB, useful for microarray probe design and correction of individual hybridizations. Software implementing the method is publicly available.

Show MeSH

Related in: MedlinePlus

Gene-specific dye bias and its correction. The degree of GSDB varies from one hybridization to another. Examples of a reference wt (green) versus other wt (red) scatterplot showing very little GSDB (A) or a large degree of GSDB (B). Each dot represents a single probe from the microarray. Green and red dots belong to the 5th and 95th percentiles of the iGSDB, respectively. The numbers along the axis represent normalized fluorescent intensities. The solid black lines mark two-fold up, no change and two-fold down. Boxplot of M-values (log2-ratio Cy5/Cy3) of the probes that suffer from the highest degree of GSDB, before (C) and after applying the correction method (D). The results of five different wt versus reference wt hybridizations are shown that suffer from increasing degrees of GSDB (low–high). These boxplots are derived from hybridizations with different dye orientations, showing that the outliers depend on the dye, rather than on the sample. From left to right the common reference wt sample was labelled with Cy5, Cy3, Cy5, Cy3 and Cy3, respectively, and is indicated with an asterisk (Cy3). The genes represented in these boxplots are identical to those coloured red and green in (A) and (B). Boxplots before (E) and after (F) GSDB correction derived from self versus self hybridizations, whereby only the degree of fluorescent label incorporation was varied for both dyes in each hybridization. A labelling percentage of 1 indicates that both Cy5 and Cy3 were incorporated at a determined efficiency of 1 fluorescent dye per 100 bases of amplified RNA. The correction applied to these arrays is derived from the independent set of 12 hybridizations also used to correct the data shown in (C, D). Scatterplot of self versus self hybridization labelled: at 3% efficiency before (G) and after (H) GSDB correction; at 2% before (I) and after (J) GSDB correction. These scatterplots are from two of the hybridizations depicted in (E) and (F). The coloured dots represent probes from four different external controls, whose RNAs were spiked in to achieve a two-fold molar difference between channels. Each external control is represented by multiple probes on the arrays. Boxplot of M-values before (K) and after (L) applying three different correction methods. Performance of the methods is measured as the change in variance of M-values compared with averaging. Averaging: simple averaging of dye swaps; VERA: (Kelley et al, 2008). This actually results in an overall 3% increase variance compared with averaging. However, the variance of the most extremely affected probes does decrease. GASSCO: the method described here, which results in 25% variance decrease.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2683724&req=5

f1: Gene-specific dye bias and its correction. The degree of GSDB varies from one hybridization to another. Examples of a reference wt (green) versus other wt (red) scatterplot showing very little GSDB (A) or a large degree of GSDB (B). Each dot represents a single probe from the microarray. Green and red dots belong to the 5th and 95th percentiles of the iGSDB, respectively. The numbers along the axis represent normalized fluorescent intensities. The solid black lines mark two-fold up, no change and two-fold down. Boxplot of M-values (log2-ratio Cy5/Cy3) of the probes that suffer from the highest degree of GSDB, before (C) and after applying the correction method (D). The results of five different wt versus reference wt hybridizations are shown that suffer from increasing degrees of GSDB (low–high). These boxplots are derived from hybridizations with different dye orientations, showing that the outliers depend on the dye, rather than on the sample. From left to right the common reference wt sample was labelled with Cy5, Cy3, Cy5, Cy3 and Cy3, respectively, and is indicated with an asterisk (Cy3). The genes represented in these boxplots are identical to those coloured red and green in (A) and (B). Boxplots before (E) and after (F) GSDB correction derived from self versus self hybridizations, whereby only the degree of fluorescent label incorporation was varied for both dyes in each hybridization. A labelling percentage of 1 indicates that both Cy5 and Cy3 were incorporated at a determined efficiency of 1 fluorescent dye per 100 bases of amplified RNA. The correction applied to these arrays is derived from the independent set of 12 hybridizations also used to correct the data shown in (C, D). Scatterplot of self versus self hybridization labelled: at 3% efficiency before (G) and after (H) GSDB correction; at 2% before (I) and after (J) GSDB correction. These scatterplots are from two of the hybridizations depicted in (E) and (F). The coloured dots represent probes from four different external controls, whose RNAs were spiked in to achieve a two-fold molar difference between channels. Each external control is represented by multiple probes on the arrays. Boxplot of M-values before (K) and after (L) applying three different correction methods. Performance of the methods is measured as the change in variance of M-values compared with averaging. Averaging: simple averaging of dye swaps; VERA: (Kelley et al, 2008). This actually results in an overall 3% increase variance compared with averaging. However, the variance of the most extremely affected probes does decrease. GASSCO: the method described here, which results in 25% variance decrease.

Mentions: As part of a project to determine differential expression between various mutant yeast strains, a number of control experiments were carried out. These controls consisted of labelling and hybridizing a single reference wild-type (wt) RNA sample against other wt RNA samples, each processed on different days. These hybridizations show diverse degrees of variation (Figure 1A and B).


Adaptable gene-specific dye bias correction for two-channel DNA microarrays.

Margaritis T, Lijnzaad P, van Leenen D, Bouwmeester D, Kemmeren P, van Hooff SR, Holstege FC - Mol. Syst. Biol. (2009)

Gene-specific dye bias and its correction. The degree of GSDB varies from one hybridization to another. Examples of a reference wt (green) versus other wt (red) scatterplot showing very little GSDB (A) or a large degree of GSDB (B). Each dot represents a single probe from the microarray. Green and red dots belong to the 5th and 95th percentiles of the iGSDB, respectively. The numbers along the axis represent normalized fluorescent intensities. The solid black lines mark two-fold up, no change and two-fold down. Boxplot of M-values (log2-ratio Cy5/Cy3) of the probes that suffer from the highest degree of GSDB, before (C) and after applying the correction method (D). The results of five different wt versus reference wt hybridizations are shown that suffer from increasing degrees of GSDB (low–high). These boxplots are derived from hybridizations with different dye orientations, showing that the outliers depend on the dye, rather than on the sample. From left to right the common reference wt sample was labelled with Cy5, Cy3, Cy5, Cy3 and Cy3, respectively, and is indicated with an asterisk (Cy3). The genes represented in these boxplots are identical to those coloured red and green in (A) and (B). Boxplots before (E) and after (F) GSDB correction derived from self versus self hybridizations, whereby only the degree of fluorescent label incorporation was varied for both dyes in each hybridization. A labelling percentage of 1 indicates that both Cy5 and Cy3 were incorporated at a determined efficiency of 1 fluorescent dye per 100 bases of amplified RNA. The correction applied to these arrays is derived from the independent set of 12 hybridizations also used to correct the data shown in (C, D). Scatterplot of self versus self hybridization labelled: at 3% efficiency before (G) and after (H) GSDB correction; at 2% before (I) and after (J) GSDB correction. These scatterplots are from two of the hybridizations depicted in (E) and (F). The coloured dots represent probes from four different external controls, whose RNAs were spiked in to achieve a two-fold molar difference between channels. Each external control is represented by multiple probes on the arrays. Boxplot of M-values before (K) and after (L) applying three different correction methods. Performance of the methods is measured as the change in variance of M-values compared with averaging. Averaging: simple averaging of dye swaps; VERA: (Kelley et al, 2008). This actually results in an overall 3% increase variance compared with averaging. However, the variance of the most extremely affected probes does decrease. GASSCO: the method described here, which results in 25% variance decrease.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2683724&req=5

f1: Gene-specific dye bias and its correction. The degree of GSDB varies from one hybridization to another. Examples of a reference wt (green) versus other wt (red) scatterplot showing very little GSDB (A) or a large degree of GSDB (B). Each dot represents a single probe from the microarray. Green and red dots belong to the 5th and 95th percentiles of the iGSDB, respectively. The numbers along the axis represent normalized fluorescent intensities. The solid black lines mark two-fold up, no change and two-fold down. Boxplot of M-values (log2-ratio Cy5/Cy3) of the probes that suffer from the highest degree of GSDB, before (C) and after applying the correction method (D). The results of five different wt versus reference wt hybridizations are shown that suffer from increasing degrees of GSDB (low–high). These boxplots are derived from hybridizations with different dye orientations, showing that the outliers depend on the dye, rather than on the sample. From left to right the common reference wt sample was labelled with Cy5, Cy3, Cy5, Cy3 and Cy3, respectively, and is indicated with an asterisk (Cy3). The genes represented in these boxplots are identical to those coloured red and green in (A) and (B). Boxplots before (E) and after (F) GSDB correction derived from self versus self hybridizations, whereby only the degree of fluorescent label incorporation was varied for both dyes in each hybridization. A labelling percentage of 1 indicates that both Cy5 and Cy3 were incorporated at a determined efficiency of 1 fluorescent dye per 100 bases of amplified RNA. The correction applied to these arrays is derived from the independent set of 12 hybridizations also used to correct the data shown in (C, D). Scatterplot of self versus self hybridization labelled: at 3% efficiency before (G) and after (H) GSDB correction; at 2% before (I) and after (J) GSDB correction. These scatterplots are from two of the hybridizations depicted in (E) and (F). The coloured dots represent probes from four different external controls, whose RNAs were spiked in to achieve a two-fold molar difference between channels. Each external control is represented by multiple probes on the arrays. Boxplot of M-values before (K) and after (L) applying three different correction methods. Performance of the methods is measured as the change in variance of M-values compared with averaging. Averaging: simple averaging of dye swaps; VERA: (Kelley et al, 2008). This actually results in an overall 3% increase variance compared with averaging. However, the variance of the most extremely affected probes does decrease. GASSCO: the method described here, which results in 25% variance decrease.
Mentions: As part of a project to determine differential expression between various mutant yeast strains, a number of control experiments were carried out. These controls consisted of labelling and hybridizing a single reference wild-type (wt) RNA sample against other wt RNA samples, each processed on different days. These hybridizations show diverse degrees of variation (Figure 1A and B).

Bottom Line: A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations.GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip.Software implementing the method is publicly available.

View Article: PubMed Central - PubMed

Affiliation: Department of Physiological Chemistry, University Medical Center Utrecht, Universiteitsweg, Utrecht, The Netherlands.

ABSTRACT
DNA microarray technology is a powerful tool for monitoring gene expression or for finding the location of DNA-bound proteins. DNA microarrays can suffer from gene-specific dye bias (GSDB), causing some probes to be affected more by the dye than by the sample. This results in large measurement errors, which vary considerably for different probes and also across different hybridizations. GSDB is not corrected by conventional normalization and has been difficult to address systematically because of its variance. We show that GSDB is influenced by label incorporation efficiency, explaining the variation of GSDB across different hybridizations. A correction method (Gene- And Slide-Specific Correction, GASSCO) is presented, whereby sequence-specific corrections are modulated by the overall bias of individual hybridizations. GASSCO outperforms earlier methods and works well on a variety of publically available datasets covering a range of platforms, organisms and applications, including ChIP on chip. A sequence-based model is also presented, which predicts which probes will suffer most from GSDB, useful for microarray probe design and correction of individual hybridizations. Software implementing the method is publicly available.

Show MeSH
Related in: MedlinePlus