Limits...
Analysis of probe level patterns in Affymetrix microarray data.

Cambon AC, Khalyfa A, Cooper NG, Thompson CM - BMC Bioinformatics (2007)

Bottom Line: Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel.Although affinity, middle base pair and probe location effects may be seen at the gross array level, these factors only account for a small proportion of the variation observed at the gene level.We discuss implications for probe sequence selection for confirmatory analysis using real time PCR.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, Kentucky, USA. accamb01@louisville.edu <accamb01@louisville.edu>

ABSTRACT

Background: Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel. Most of the widely used methods for analyzing Affymetrix Genechip microarray data, including RMA, GCRMA and Model Based Expression Index (MBEI), summarize probe signal intensity data to generate a single measure of expression for each transcript on the array. In contrast, other methods are applied directly to probe intensities, negating the need for a summarization step.

Results: In this study, we used the Affymetrix rat genome Genechip to explore variability in probe response patterns within transcripts. We considered a number of possible sources of variability in probe sets including probe location within the transcript, middle base pair of the probe sequence, probe overlap, sequence homology and affinity. Although affinity, middle base pair and probe location effects may be seen at the gross array level, these factors only account for a small proportion of the variation observed at the gene level. A BLAST search and the presence of probe by treatment interactions for selected differentially expressed genes showed high sequence homology for many probes to non-target genes.

Conclusion: We suggest that examination and modeling of probe level intensities can be used to guide researchers in refining their conclusions regarding differentially expressed genes. We discuss implications for probe sequence selection for confirmatory analysis using real time PCR.

Show MeSH
Scatter plots of log2 array-mean-centered PM probe intensities vs. PM probe affinities. The top row shows scatter plots for all four arrays (A,B,C,D) for Hsbp1, the number 1 up-regulated gene. The bottom row shows all four arrays for gene Id2, the top down-regulated gene. Affinities explain only a small part of variation between probes at the gene level. Affinities were calculated using the default method in Bioconductor package gcrma version 2.20. Affinities for perfect match probes are shown. Pearson correlation coefficients vary from 0.08 to 0.76 on the 8 scatter plots above. A scatter plot of affinities vs. log2 probe intensities (not shown) for all probes in array C are similar to the corresponding diagram in Wu et al. [5].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1884176&req=5

Figure 3: Scatter plots of log2 array-mean-centered PM probe intensities vs. PM probe affinities. The top row shows scatter plots for all four arrays (A,B,C,D) for Hsbp1, the number 1 up-regulated gene. The bottom row shows all four arrays for gene Id2, the top down-regulated gene. Affinities explain only a small part of variation between probes at the gene level. Affinities were calculated using the default method in Bioconductor package gcrma version 2.20. Affinities for perfect match probes are shown. Pearson correlation coefficients vary from 0.08 to 0.76 on the 8 scatter plots above. A scatter plot of affinities vs. log2 probe intensities (not shown) for all probes in array C are similar to the corresponding diagram in Wu et al. [5].

Mentions: An effort to understand the sources of variability among probes within the probe set for each transcript is an area of much research. It has been suggested that some of the variation between probes for the same transcript can be explained by affinities, or position dependent base effects [5]. For example, Figures 2b, 2c, and 2d show the effect of the middle base pair in the probe sequence on log2 intensity at the gross array level. However, in this study, differences in probe affinities within transcripts, calculated using the gcrma package [6], accounted for only a small proportion of the total observed variation in probe intensities (Figure 3 and Additional file 2). Figures 2a and 2d also show a subtle increasing trend by probe number at the array level (probes are numbered in order of increasing distance from the 5' end). This is consistent with the fact that probe intensities are expected to be systematically lower at the 5' end of the probe set compared to the 3' end [7]. However, as with affinities, this phenomenon accounts for only a small proportion of total variation in probe intensities at the gene level (Figure 1).


Analysis of probe level patterns in Affymetrix microarray data.

Cambon AC, Khalyfa A, Cooper NG, Thompson CM - BMC Bioinformatics (2007)

Scatter plots of log2 array-mean-centered PM probe intensities vs. PM probe affinities. The top row shows scatter plots for all four arrays (A,B,C,D) for Hsbp1, the number 1 up-regulated gene. The bottom row shows all four arrays for gene Id2, the top down-regulated gene. Affinities explain only a small part of variation between probes at the gene level. Affinities were calculated using the default method in Bioconductor package gcrma version 2.20. Affinities for perfect match probes are shown. Pearson correlation coefficients vary from 0.08 to 0.76 on the 8 scatter plots above. A scatter plot of affinities vs. log2 probe intensities (not shown) for all probes in array C are similar to the corresponding diagram in Wu et al. [5].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1884176&req=5

Figure 3: Scatter plots of log2 array-mean-centered PM probe intensities vs. PM probe affinities. The top row shows scatter plots for all four arrays (A,B,C,D) for Hsbp1, the number 1 up-regulated gene. The bottom row shows all four arrays for gene Id2, the top down-regulated gene. Affinities explain only a small part of variation between probes at the gene level. Affinities were calculated using the default method in Bioconductor package gcrma version 2.20. Affinities for perfect match probes are shown. Pearson correlation coefficients vary from 0.08 to 0.76 on the 8 scatter plots above. A scatter plot of affinities vs. log2 probe intensities (not shown) for all probes in array C are similar to the corresponding diagram in Wu et al. [5].
Mentions: An effort to understand the sources of variability among probes within the probe set for each transcript is an area of much research. It has been suggested that some of the variation between probes for the same transcript can be explained by affinities, or position dependent base effects [5]. For example, Figures 2b, 2c, and 2d show the effect of the middle base pair in the probe sequence on log2 intensity at the gross array level. However, in this study, differences in probe affinities within transcripts, calculated using the gcrma package [6], accounted for only a small proportion of the total observed variation in probe intensities (Figure 3 and Additional file 2). Figures 2a and 2d also show a subtle increasing trend by probe number at the array level (probes are numbered in order of increasing distance from the 5' end). This is consistent with the fact that probe intensities are expected to be systematically lower at the 5' end of the probe set compared to the 3' end [7]. However, as with affinities, this phenomenon accounts for only a small proportion of total variation in probe intensities at the gene level (Figure 1).

Bottom Line: Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel.Although affinity, middle base pair and probe location effects may be seen at the gross array level, these factors only account for a small proportion of the variation observed at the gene level.We discuss implications for probe sequence selection for confirmatory analysis using real time PCR.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, Kentucky, USA. accamb01@louisville.edu <accamb01@louisville.edu>

ABSTRACT

Background: Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel. Most of the widely used methods for analyzing Affymetrix Genechip microarray data, including RMA, GCRMA and Model Based Expression Index (MBEI), summarize probe signal intensity data to generate a single measure of expression for each transcript on the array. In contrast, other methods are applied directly to probe intensities, negating the need for a summarization step.

Results: In this study, we used the Affymetrix rat genome Genechip to explore variability in probe response patterns within transcripts. We considered a number of possible sources of variability in probe sets including probe location within the transcript, middle base pair of the probe sequence, probe overlap, sequence homology and affinity. Although affinity, middle base pair and probe location effects may be seen at the gross array level, these factors only account for a small proportion of the variation observed at the gene level. A BLAST search and the presence of probe by treatment interactions for selected differentially expressed genes showed high sequence homology for many probes to non-target genes.

Conclusion: We suggest that examination and modeling of probe level intensities can be used to guide researchers in refining their conclusions regarding differentially expressed genes. We discuss implications for probe sequence selection for confirmatory analysis using real time PCR.

Show MeSH