Limits...
Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis.

Llorens F, Hummel M, Pastor X, Ferrer A, Pluvinet R, Vivancos A, Castillo E, Iraola S, Mosquera AM, González E, Lozano J, Ingham M, Dohm JC, Noguera M, Kofler R, del Río JA, Bayés M, Himmelbauer H, Sumoy L - BMC Genomics (2011)

Bottom Line: In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets.This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Genomic Regulation (CRG)-Universitat Pompeu Fabra (UPF), Barcelona, Spain.

ABSTRACT

Background: Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer.

Results: By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.

Conclusions: We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

Show MeSH

Related in: MedlinePlus

Microarray versus DGE analysis. (A) Overlap of unique and named genes shared among the 3 microarray platforms and genes detected by DGE. The pool of 14645 shared genes was used for further cross-platform analysis. The total numbers of genes for each platform and for all platforms combined are indicated. (B) Overlap of significantly regulated genes considering the 3 microarray platforms at 6 h after EGF treatment and the genes found regulated after assessing significance by grouping microarray and DGE data in a RankProd analysis. Left panels show up-regulated genes and right panels show down-regulated genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3141672&req=5

Figure 3: Microarray versus DGE analysis. (A) Overlap of unique and named genes shared among the 3 microarray platforms and genes detected by DGE. The pool of 14645 shared genes was used for further cross-platform analysis. The total numbers of genes for each platform and for all platforms combined are indicated. (B) Overlap of significantly regulated genes considering the 3 microarray platforms at 6 h after EGF treatment and the genes found regulated after assessing significance by grouping microarray and DGE data in a RankProd analysis. Left panels show up-regulated genes and right panels show down-regulated genes.

Mentions: The final gene lists obtained from microarray data analyses are only a partial representation of the transcriptome due to the fact that the genes surveyed are constrained to the probes present in each array, and because the overlap in gene coverage and in differential gene expression detection between platforms is incomplete. Ideally, it would be desirable to have a detailed and comprehensive gene list of EGF-dependent genes. The only way to extend the validation without being limited by the probe content of each platform is to use an open technique. For this reason we used the DGE methodology developed by Illumina which is based on the SAGE principle but up-scaled on the Genome Analyzer I (GA-I) next generation sequencing platform [30-35]. We re-analyzed aliquots of total RNA from the exact same three replicate experiments that had been tested on microarrays: serum-starved and EGF-treated for 6 h. On average, 9 × 10E6 raw sequences were obtained per sample, which after running the analysis pipeline allowed us to monitor the expression of 4.9 × 10E6 unambiguously matching tags, corresponding to 16,350 different genes (as determined from RefSeq unique gene symbols) (Table 1; Additional file 5, Table S4). This number has been considered sufficient by others to achieve over 90% coverage of the transcriptome, with as high or higher sensitivity than short oligonucleotide probe microarrays [33]. 16,220 of the 17,070 genes represented in every microarray platform could be detected through DGE. 3,972 genes represented in either of the 3 microarray platforms had no detectable measure by DGE in any of the three biological replicates, whereas 130 detected tag sequencing targets had not been addressed by any of the microarray platforms (Figure 3A). Neither SAM nor RankProd statistical analysis of differential gene expression by DGE gave any significant genes after multiple testing correction. A general comparison between microarrays versus deep sequencing showed better correlation among genes that had 32 or more counts in their tag sequences (Figure 4A). Following, we used CAT ('concordance at the top') plots [40] representing the changes among the proportions of genes shared between gene lists ranked by fold change as a measure of the concordance between each of the different microarray platforms and DGE compared to our reference microarray platform (Agilent, Figure 4B). We then compared all microarrays to the DGE dataset (DGE, Figure 4C), showing that there is a significant degree of agreement between the three alternative commercial array platforms and DGE (Figures 4B and 4C). These plots show that the concordance is highest between the top 100 genes and that, as we increase the list size, the proportion of genes shared among lists stabilizes around 45-50% between microarray platforms and around 30% between microarrays and DGE. In part this is explained by the fact that EGF regulates many genes and the fold changes detected by each platform are correlated but the exact ranking can vary a lot given the large number of genes affected. In agreement with this, gene set enrichment analysis showed a significant correlation between the 3 microarray platforms and DGE (Data not shown).


Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis.

Llorens F, Hummel M, Pastor X, Ferrer A, Pluvinet R, Vivancos A, Castillo E, Iraola S, Mosquera AM, González E, Lozano J, Ingham M, Dohm JC, Noguera M, Kofler R, del Río JA, Bayés M, Himmelbauer H, Sumoy L - BMC Genomics (2011)

Microarray versus DGE analysis. (A) Overlap of unique and named genes shared among the 3 microarray platforms and genes detected by DGE. The pool of 14645 shared genes was used for further cross-platform analysis. The total numbers of genes for each platform and for all platforms combined are indicated. (B) Overlap of significantly regulated genes considering the 3 microarray platforms at 6 h after EGF treatment and the genes found regulated after assessing significance by grouping microarray and DGE data in a RankProd analysis. Left panels show up-regulated genes and right panels show down-regulated genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3141672&req=5

Figure 3: Microarray versus DGE analysis. (A) Overlap of unique and named genes shared among the 3 microarray platforms and genes detected by DGE. The pool of 14645 shared genes was used for further cross-platform analysis. The total numbers of genes for each platform and for all platforms combined are indicated. (B) Overlap of significantly regulated genes considering the 3 microarray platforms at 6 h after EGF treatment and the genes found regulated after assessing significance by grouping microarray and DGE data in a RankProd analysis. Left panels show up-regulated genes and right panels show down-regulated genes.
Mentions: The final gene lists obtained from microarray data analyses are only a partial representation of the transcriptome due to the fact that the genes surveyed are constrained to the probes present in each array, and because the overlap in gene coverage and in differential gene expression detection between platforms is incomplete. Ideally, it would be desirable to have a detailed and comprehensive gene list of EGF-dependent genes. The only way to extend the validation without being limited by the probe content of each platform is to use an open technique. For this reason we used the DGE methodology developed by Illumina which is based on the SAGE principle but up-scaled on the Genome Analyzer I (GA-I) next generation sequencing platform [30-35]. We re-analyzed aliquots of total RNA from the exact same three replicate experiments that had been tested on microarrays: serum-starved and EGF-treated for 6 h. On average, 9 × 10E6 raw sequences were obtained per sample, which after running the analysis pipeline allowed us to monitor the expression of 4.9 × 10E6 unambiguously matching tags, corresponding to 16,350 different genes (as determined from RefSeq unique gene symbols) (Table 1; Additional file 5, Table S4). This number has been considered sufficient by others to achieve over 90% coverage of the transcriptome, with as high or higher sensitivity than short oligonucleotide probe microarrays [33]. 16,220 of the 17,070 genes represented in every microarray platform could be detected through DGE. 3,972 genes represented in either of the 3 microarray platforms had no detectable measure by DGE in any of the three biological replicates, whereas 130 detected tag sequencing targets had not been addressed by any of the microarray platforms (Figure 3A). Neither SAM nor RankProd statistical analysis of differential gene expression by DGE gave any significant genes after multiple testing correction. A general comparison between microarrays versus deep sequencing showed better correlation among genes that had 32 or more counts in their tag sequences (Figure 4A). Following, we used CAT ('concordance at the top') plots [40] representing the changes among the proportions of genes shared between gene lists ranked by fold change as a measure of the concordance between each of the different microarray platforms and DGE compared to our reference microarray platform (Agilent, Figure 4B). We then compared all microarrays to the DGE dataset (DGE, Figure 4C), showing that there is a significant degree of agreement between the three alternative commercial array platforms and DGE (Figures 4B and 4C). These plots show that the concordance is highest between the top 100 genes and that, as we increase the list size, the proportion of genes shared among lists stabilizes around 45-50% between microarray platforms and around 30% between microarrays and DGE. In part this is explained by the fact that EGF regulates many genes and the fold changes detected by each platform are correlated but the exact ranking can vary a lot given the large number of genes affected. In agreement with this, gene set enrichment analysis showed a significant correlation between the 3 microarray platforms and DGE (Data not shown).

Bottom Line: In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets.This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Genomic Regulation (CRG)-Universitat Pompeu Fabra (UPF), Barcelona, Spain.

ABSTRACT

Background: Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer.

Results: By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.

Conclusions: We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

Show MeSH
Related in: MedlinePlus