Limits...
Can subtle changes in gene expression be consistently detected with different microarray platforms?

Pedotti P, 't Hoen PA, Vreugdenhil E, Schenk GJ, Vossen RH, Ariyurek Y, de Hollander M, Kuiper R, van Ommen GJ, den Dunnen JT, Boer JM, de Menezes RX - BMC Genomics (2008)

Bottom Line: Two genes were found significantly differentially expressed by all platforms and the four genes identified by the ABI platform were found by at least three other platforms.We observed improved correlations between platforms when ranking the genes based on the significance level than with a fixed statistical cut-off.We demonstrate significant overlap in the affected gene sets identified by the different platforms, although biological processes were represented by only partially overlapping sets of genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands. paola.pedotti@gmail.com

ABSTRACT

Background: The comparability of gene expression data generated with different microarray platforms is still a matter of concern. Here we address the performance and the overlap in the detection of differentially expressed genes for five different microarray platforms in a challenging biological context where differences in gene expression are few and subtle.

Results: Gene expression profiles in the hippocampus of five wild-type and five transgenic deltaC-doublecortin-like kinase mice were evaluated with five microarray platforms: Applied Biosystems, Affymetrix, Agilent, Illumina, LGTC home-spotted arrays. Using a fixed false discovery rate of 10% we detected surprising differences between the number of differentially expressed genes per platform. Four genes were selected by ABI, 130 by Affymetrix, 3,051 by Agilent, 54 by Illumina, and 13 by LGTC. Two genes were found significantly differentially expressed by all platforms and the four genes identified by the ABI platform were found by at least three other platforms. Quantitative RT-PCR analysis confirmed 20 out of 28 of the genes detected by two or more platforms and 8 out of 15 of the genes detected by Agilent only. We observed improved correlations between platforms when ranking the genes based on the significance level than with a fixed statistical cut-off. We demonstrate significant overlap in the affected gene sets identified by the different platforms, although biological processes were represented by only partially overlapping sets of genes. Aberrances in GABA-ergic signalling in the transgenic mice were consistently found by all platforms.

Conclusion: The different microarray platforms give partially complementary views on biological processes affected. Our data indicate that when analyzing samples with only subtle differences in gene expression the use of two different platforms might be more attractive than increasing the number of replicates. Commercial two-color platforms seem to have higher power for finding differentially expressed genes between groups with small differences in expression.

Show MeSH

Related in: MedlinePlus

Scattersmooth plots of the correlation between the ranks (according to p values) of genes in the UG dataset of the 5 platforms. Red corresponds to denser areas, while yellow corresponds to non dense areas. The scattersmooth uses an algorithm for smoothing of two dimensional histograms with smoothed densities (26). This graph is more meaningful than a traditional scatter plot of the p values or of the -log p values, where the smallnumber of DEGs in our datasets originates graph blurred with thousands of overlapping dots and empty areas. Since the different signal to noise ratio is varying in the platforms and affects the statistics differently, plots of the ranks are more meaningful than plot of p values and statistics.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2335120&req=5

Figure 1: Scattersmooth plots of the correlation between the ranks (according to p values) of genes in the UG dataset of the 5 platforms. Red corresponds to denser areas, while yellow corresponds to non dense areas. The scattersmooth uses an algorithm for smoothing of two dimensional histograms with smoothed densities (26). This graph is more meaningful than a traditional scatter plot of the p values or of the -log p values, where the smallnumber of DEGs in our datasets originates graph blurred with thousands of overlapping dots and empty areas. Since the different signal to noise ratio is varying in the platforms and affects the statistics differently, plots of the ranks are more meaningful than plot of p values and statistics.

Mentions: Results for the subset of genes with overlapping UG identifiers are reported in Table 1 and show the same trend already observed in the complete datasets. In Table 3 the overlaps in DEGs selected by each pair of platforms are reported. Two genes were selected by all 5 platforms (Plac9, 9230117N10Rik). The 4 genes identified by ABI were selected on at least three other platforms. Overall, correspondence between platforms appears to be low. This is likely due to the use of a fixed statistical threshold. A higher correlation was found when evaluating the ranks of genes based on significance score. In Figure 1 the ranks for each gene are plotted for each pair of platforms. A scattersmooth function [34] is used for better visualization of the data cloud. As can be seen, in the area of the highly ranked genes (roughly from rank 1 – rank 200) there is a higher correlation between platforms than in the area of lower ranked genes. This is expected because only genes with significantly differential expression should be correlated while no correlation and complete scattering is expected for unchanged genes. We also considered the moderate t-statistics from the EBLRM which takes into account the direction of changes in the gene expression. The Pearson correlation coefficients (cP) of the t statistics within pair of platforms ranged between 0.10–0.47 (Table 3). Correlations between pairs of platforms belonging to the same type (one- or two-color) where higher than between those of different types, with cP = 0.47 between AFF – ILL and between AGL – LGTC. Given the fact that the correlations are calculated based on all genes of which the biggest majority does not change in expression, higher correlations are not to be expected.


Can subtle changes in gene expression be consistently detected with different microarray platforms?

Pedotti P, 't Hoen PA, Vreugdenhil E, Schenk GJ, Vossen RH, Ariyurek Y, de Hollander M, Kuiper R, van Ommen GJ, den Dunnen JT, Boer JM, de Menezes RX - BMC Genomics (2008)

Scattersmooth plots of the correlation between the ranks (according to p values) of genes in the UG dataset of the 5 platforms. Red corresponds to denser areas, while yellow corresponds to non dense areas. The scattersmooth uses an algorithm for smoothing of two dimensional histograms with smoothed densities (26). This graph is more meaningful than a traditional scatter plot of the p values or of the -log p values, where the smallnumber of DEGs in our datasets originates graph blurred with thousands of overlapping dots and empty areas. Since the different signal to noise ratio is varying in the platforms and affects the statistics differently, plots of the ranks are more meaningful than plot of p values and statistics.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2335120&req=5

Figure 1: Scattersmooth plots of the correlation between the ranks (according to p values) of genes in the UG dataset of the 5 platforms. Red corresponds to denser areas, while yellow corresponds to non dense areas. The scattersmooth uses an algorithm for smoothing of two dimensional histograms with smoothed densities (26). This graph is more meaningful than a traditional scatter plot of the p values or of the -log p values, where the smallnumber of DEGs in our datasets originates graph blurred with thousands of overlapping dots and empty areas. Since the different signal to noise ratio is varying in the platforms and affects the statistics differently, plots of the ranks are more meaningful than plot of p values and statistics.
Mentions: Results for the subset of genes with overlapping UG identifiers are reported in Table 1 and show the same trend already observed in the complete datasets. In Table 3 the overlaps in DEGs selected by each pair of platforms are reported. Two genes were selected by all 5 platforms (Plac9, 9230117N10Rik). The 4 genes identified by ABI were selected on at least three other platforms. Overall, correspondence between platforms appears to be low. This is likely due to the use of a fixed statistical threshold. A higher correlation was found when evaluating the ranks of genes based on significance score. In Figure 1 the ranks for each gene are plotted for each pair of platforms. A scattersmooth function [34] is used for better visualization of the data cloud. As can be seen, in the area of the highly ranked genes (roughly from rank 1 – rank 200) there is a higher correlation between platforms than in the area of lower ranked genes. This is expected because only genes with significantly differential expression should be correlated while no correlation and complete scattering is expected for unchanged genes. We also considered the moderate t-statistics from the EBLRM which takes into account the direction of changes in the gene expression. The Pearson correlation coefficients (cP) of the t statistics within pair of platforms ranged between 0.10–0.47 (Table 3). Correlations between pairs of platforms belonging to the same type (one- or two-color) where higher than between those of different types, with cP = 0.47 between AFF – ILL and between AGL – LGTC. Given the fact that the correlations are calculated based on all genes of which the biggest majority does not change in expression, higher correlations are not to be expected.

Bottom Line: Two genes were found significantly differentially expressed by all platforms and the four genes identified by the ABI platform were found by at least three other platforms.We observed improved correlations between platforms when ranking the genes based on the significance level than with a fixed statistical cut-off.We demonstrate significant overlap in the affected gene sets identified by the different platforms, although biological processes were represented by only partially overlapping sets of genes.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands. paola.pedotti@gmail.com

ABSTRACT

Background: The comparability of gene expression data generated with different microarray platforms is still a matter of concern. Here we address the performance and the overlap in the detection of differentially expressed genes for five different microarray platforms in a challenging biological context where differences in gene expression are few and subtle.

Results: Gene expression profiles in the hippocampus of five wild-type and five transgenic deltaC-doublecortin-like kinase mice were evaluated with five microarray platforms: Applied Biosystems, Affymetrix, Agilent, Illumina, LGTC home-spotted arrays. Using a fixed false discovery rate of 10% we detected surprising differences between the number of differentially expressed genes per platform. Four genes were selected by ABI, 130 by Affymetrix, 3,051 by Agilent, 54 by Illumina, and 13 by LGTC. Two genes were found significantly differentially expressed by all platforms and the four genes identified by the ABI platform were found by at least three other platforms. Quantitative RT-PCR analysis confirmed 20 out of 28 of the genes detected by two or more platforms and 8 out of 15 of the genes detected by Agilent only. We observed improved correlations between platforms when ranking the genes based on the significance level than with a fixed statistical cut-off. We demonstrate significant overlap in the affected gene sets identified by the different platforms, although biological processes were represented by only partially overlapping sets of genes. Aberrances in GABA-ergic signalling in the transgenic mice were consistently found by all platforms.

Conclusion: The different microarray platforms give partially complementary views on biological processes affected. Our data indicate that when analyzing samples with only subtle differences in gene expression the use of two different platforms might be more attractive than increasing the number of replicates. Commercial two-color platforms seem to have higher power for finding differentially expressed genes between groups with small differences in expression.

Show MeSH
Related in: MedlinePlus