Limits...
G-spots cause incorrect expression measurement in Affymetrix microarrays.

Upton GJ, Langdon WB, Harrison AP - BMC Genomics (2008)

Bottom Line: We have tested this expectation by examining the correlation coefficients between pairs of probes using the data on thousands of arrays that are available in the NCBI Gene Expression Omnibus (GEO) repository.This has serious implications, since more than 40% of the probesets in the HG-U133A GeneChip contain at least one such probe.Future array designs should avoid these untrustworthy probes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Mathematical and Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO43SQ, UK. gupton@essex.ac.uk

ABSTRACT

Background: High Density Oligonucleotide arrays (HDONAs), such as the Affymetrix HG-U133A GeneChip, use sets of probes chosen to match specified genes, with the expectation that if a particular gene is highly expressed then all the probes in that gene's probe set will provide a consistent message signifying the gene's presence. However, probes that contain a G-spot (a sequence of four or more guanines) behave abnormally and it has been suggested that these probes are responding to some biochemical effect such as the formation of G-quadruplexes.

Results: We have tested this expectation by examining the correlation coefficients between pairs of probes using the data on thousands of arrays that are available in the NCBI Gene Expression Omnibus (GEO) repository. We confirm the finding that G-spot probes are poorly correlated with others in their probesets and reveal that, by contrast, they are highly correlated with one another. We demonstrate that the correlation is most marked when the G-spot is at the 5' end of the probe.

Conclusion: Since these G-spot probes generally show little correlation with the other members of their probesets they are not fit for purpose and their values should be excluded when calculating gene expression values. This has serious implications, since more than 40% of the probesets in the HG-U133A GeneChip contain at least one such probe. Future array designs should avoid these untrustworthy probes.

Show MeSH

Related in: MedlinePlus

Scatter diagrams of normalised probe intensities for two pairs of probes from probe set 31846_at (which matches the gene RHOD). (i) Probes PM 5 and PM 16 (r = 0.86); (ii) Probes PM 5 and PM 6 (r = -0.01).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2628396&req=5

Figure 2: Scatter diagrams of normalised probe intensities for two pairs of probes from probe set 31846_at (which matches the gene RHOD). (i) Probes PM 5 and PM 16 (r = 0.86); (ii) Probes PM 5 and PM 6 (r = -0.01).

Mentions: Our results use data from 6685 HG-U133A CEL files downloaded from the NCBI Gene Expression Omnibus (GEO) repository[5]. (After purified mRNA is processed and hybridised to an array, the Affymetrix scanner stores the average fluorescence intensity of each probe in the array in a data file, known as a CEL file.) The HG-U133A array contains about 22 300 probe sets matching to about 16 000 genes. After normalising each CEL file, we examined the values of the correlation coefficients between pairs of probes from within the same probe set searching for anomalies. An example is provided by the probe set 31846_at which is one of two probe sets designed to match the gene RHOD. This probe set contains 16 PM probes all drawn from the same exon and gives rise to the correlation 'heatmap' of Fig. 1. The value of the correlation coefficient between almost any pair of these PM probes is strongly positive, with the sole exceptions being that probe pm6 (the sixth of the PM probes in this probe set) has near-zero values for its correlation coefficients with all the other probes. The values giving rise to some of these correlation coefficients are indicated in the scatter diagrams in Fig. 2. Although probes 5 and 16 are separated by 192 bases their log(intensities) are highly correlated (r = 0.86), whereas probes pm5 and pm6, though separated by just 29 bases, have log(intensities) displaying a near-zero correlation coefficient. Near-zero correlation coefficients could occur with probes having intensities so low that they are dominated by the background 'noise' of the chip, but that is not the case in this instance since the average normalised intensities for probes pm5, pm6 and pm16 are 225, 389 and 504, respectively.


G-spots cause incorrect expression measurement in Affymetrix microarrays.

Upton GJ, Langdon WB, Harrison AP - BMC Genomics (2008)

Scatter diagrams of normalised probe intensities for two pairs of probes from probe set 31846_at (which matches the gene RHOD). (i) Probes PM 5 and PM 16 (r = 0.86); (ii) Probes PM 5 and PM 6 (r = -0.01).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2628396&req=5

Figure 2: Scatter diagrams of normalised probe intensities for two pairs of probes from probe set 31846_at (which matches the gene RHOD). (i) Probes PM 5 and PM 16 (r = 0.86); (ii) Probes PM 5 and PM 6 (r = -0.01).
Mentions: Our results use data from 6685 HG-U133A CEL files downloaded from the NCBI Gene Expression Omnibus (GEO) repository[5]. (After purified mRNA is processed and hybridised to an array, the Affymetrix scanner stores the average fluorescence intensity of each probe in the array in a data file, known as a CEL file.) The HG-U133A array contains about 22 300 probe sets matching to about 16 000 genes. After normalising each CEL file, we examined the values of the correlation coefficients between pairs of probes from within the same probe set searching for anomalies. An example is provided by the probe set 31846_at which is one of two probe sets designed to match the gene RHOD. This probe set contains 16 PM probes all drawn from the same exon and gives rise to the correlation 'heatmap' of Fig. 1. The value of the correlation coefficient between almost any pair of these PM probes is strongly positive, with the sole exceptions being that probe pm6 (the sixth of the PM probes in this probe set) has near-zero values for its correlation coefficients with all the other probes. The values giving rise to some of these correlation coefficients are indicated in the scatter diagrams in Fig. 2. Although probes 5 and 16 are separated by 192 bases their log(intensities) are highly correlated (r = 0.86), whereas probes pm5 and pm6, though separated by just 29 bases, have log(intensities) displaying a near-zero correlation coefficient. Near-zero correlation coefficients could occur with probes having intensities so low that they are dominated by the background 'noise' of the chip, but that is not the case in this instance since the average normalised intensities for probes pm5, pm6 and pm16 are 225, 389 and 504, respectively.

Bottom Line: We have tested this expectation by examining the correlation coefficients between pairs of probes using the data on thousands of arrays that are available in the NCBI Gene Expression Omnibus (GEO) repository.This has serious implications, since more than 40% of the probesets in the HG-U133A GeneChip contain at least one such probe.Future array designs should avoid these untrustworthy probes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Departments of Mathematical and Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO43SQ, UK. gupton@essex.ac.uk

ABSTRACT

Background: High Density Oligonucleotide arrays (HDONAs), such as the Affymetrix HG-U133A GeneChip, use sets of probes chosen to match specified genes, with the expectation that if a particular gene is highly expressed then all the probes in that gene's probe set will provide a consistent message signifying the gene's presence. However, probes that contain a G-spot (a sequence of four or more guanines) behave abnormally and it has been suggested that these probes are responding to some biochemical effect such as the formation of G-quadruplexes.

Results: We have tested this expectation by examining the correlation coefficients between pairs of probes using the data on thousands of arrays that are available in the NCBI Gene Expression Omnibus (GEO) repository. We confirm the finding that G-spot probes are poorly correlated with others in their probesets and reveal that, by contrast, they are highly correlated with one another. We demonstrate that the correlation is most marked when the G-spot is at the 5' end of the probe.

Conclusion: Since these G-spot probes generally show little correlation with the other members of their probesets they are not fit for purpose and their values should be excluded when calculating gene expression values. This has serious implications, since more than 40% of the probesets in the HG-U133A GeneChip contain at least one such probe. Future array designs should avoid these untrustworthy probes.

Show MeSH
Related in: MedlinePlus