Limits...
Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis.

Llorens F, Hummel M, Pastor X, Ferrer A, Pluvinet R, Vivancos A, Castillo E, Iraola S, Mosquera AM, González E, Lozano J, Ingham M, Dohm JC, Noguera M, Kofler R, del Río JA, Bayés M, Himmelbauer H, Sumoy L - BMC Genomics (2011)

Bottom Line: In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets.This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Genomic Regulation (CRG)-Universitat Pompeu Fabra (UPF), Barcelona, Spain.

ABSTRACT

Background: Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer.

Results: By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.

Conclusions: We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

Show MeSH

Related in: MedlinePlus

GSEA analysis on significantly regulated gene sets across microarray platforms. Profile of the Running ES Score & Positions of Gene Set Members on the Rank Ordered List using 6 h EGF treatment data according to each of the three microarray platforms. In each panel, the vertical black lines indicate the position of each of the genes of the tested gene set in the reference data set (ranked by average of the three respective EGF versus control log2ratios of replicate experiments). The green curve plots the ES (enrichment score), which is the running sum of the weighted enrichment score obtained from GSEA software. Within each queried gene set, the farther the position of a gene to the left (red) implies a higher correlation with EGF up-regulated genes in the reference platform, and the farther to the right (blue) implies a higher correlation with genes down-regulated upon EGF treatment in the reference platform. Studied gene sets correspond to lists of up- or down-regulated genes in each platform at 6 h of EGF treatment. Significantly enriched data sets are defined according to GSEA default settings (p < 0.001 and a false discovery rate (FDR) < 0.25). R.L.M = ranked list metric.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3141672&req=5

Figure 2: GSEA analysis on significantly regulated gene sets across microarray platforms. Profile of the Running ES Score & Positions of Gene Set Members on the Rank Ordered List using 6 h EGF treatment data according to each of the three microarray platforms. In each panel, the vertical black lines indicate the position of each of the genes of the tested gene set in the reference data set (ranked by average of the three respective EGF versus control log2ratios of replicate experiments). The green curve plots the ES (enrichment score), which is the running sum of the weighted enrichment score obtained from GSEA software. Within each queried gene set, the farther the position of a gene to the left (red) implies a higher correlation with EGF up-regulated genes in the reference platform, and the farther to the right (blue) implies a higher correlation with genes down-regulated upon EGF treatment in the reference platform. Studied gene sets correspond to lists of up- or down-regulated genes in each platform at 6 h of EGF treatment. Significantly enriched data sets are defined according to GSEA default settings (p < 0.001 and a false discovery rate (FDR) < 0.25). R.L.M = ranked list metric.

Mentions: For comparison of results across technologies we focused on RefSeq genes with associated gene symbols. This also simplifies functional analysis given that most genes with known function belong to this group of better annotated genes. Initial comparison between platforms of the rates of change in gene expression expressed as log2ratios using RefSeq remapped probe gene symbols as common identifiers and the median value of all probes for each gene showed a variable degree of correlation. These platforms have 17,070 RefSeq genes in common (Figure 1A). The first exploration of the data trying to find shared regulated genes, showed a strikingly low degree of overlap between the lists of most significantly regulated genes, when determined by applying an absolute fold change cut-off of 1.2 and setting a false discovery rate at 5% with significance analysis of microarrays (SAM) (Figure 1B; Additional file 2, Table S1). The reduced overlap observed is consistent with previous reports of small intersection between lists in similar experimental designs [21,26,36]. We then used gene set enrichment analysis as implemented in the GSEA tool [37] (which takes into account the entire distribution of log2ratios) to increase the power of the comparison of the results of all three platforms [36]. Our GSEA analysis showed a highly significant agreement between all three platforms, since each gene set identified by any of the three platforms was found to be asymmetrically distributed within the remaining rank ordered differential gene expression datasets (GSEA FDR q-value = 0 for all comparisons) (Figure 2; Additional file 3, Table S2). This result strongly argues in favor of all platforms being able to detect the same underlying transcriptional response behavior, while differences among individual gene measurements make it more difficult to detect these common properties when focusing only on the intersection between the top significant gene lists from the individual platforms.


Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis.

Llorens F, Hummel M, Pastor X, Ferrer A, Pluvinet R, Vivancos A, Castillo E, Iraola S, Mosquera AM, González E, Lozano J, Ingham M, Dohm JC, Noguera M, Kofler R, del Río JA, Bayés M, Himmelbauer H, Sumoy L - BMC Genomics (2011)

GSEA analysis on significantly regulated gene sets across microarray platforms. Profile of the Running ES Score & Positions of Gene Set Members on the Rank Ordered List using 6 h EGF treatment data according to each of the three microarray platforms. In each panel, the vertical black lines indicate the position of each of the genes of the tested gene set in the reference data set (ranked by average of the three respective EGF versus control log2ratios of replicate experiments). The green curve plots the ES (enrichment score), which is the running sum of the weighted enrichment score obtained from GSEA software. Within each queried gene set, the farther the position of a gene to the left (red) implies a higher correlation with EGF up-regulated genes in the reference platform, and the farther to the right (blue) implies a higher correlation with genes down-regulated upon EGF treatment in the reference platform. Studied gene sets correspond to lists of up- or down-regulated genes in each platform at 6 h of EGF treatment. Significantly enriched data sets are defined according to GSEA default settings (p < 0.001 and a false discovery rate (FDR) < 0.25). R.L.M = ranked list metric.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3141672&req=5

Figure 2: GSEA analysis on significantly regulated gene sets across microarray platforms. Profile of the Running ES Score & Positions of Gene Set Members on the Rank Ordered List using 6 h EGF treatment data according to each of the three microarray platforms. In each panel, the vertical black lines indicate the position of each of the genes of the tested gene set in the reference data set (ranked by average of the three respective EGF versus control log2ratios of replicate experiments). The green curve plots the ES (enrichment score), which is the running sum of the weighted enrichment score obtained from GSEA software. Within each queried gene set, the farther the position of a gene to the left (red) implies a higher correlation with EGF up-regulated genes in the reference platform, and the farther to the right (blue) implies a higher correlation with genes down-regulated upon EGF treatment in the reference platform. Studied gene sets correspond to lists of up- or down-regulated genes in each platform at 6 h of EGF treatment. Significantly enriched data sets are defined according to GSEA default settings (p < 0.001 and a false discovery rate (FDR) < 0.25). R.L.M = ranked list metric.
Mentions: For comparison of results across technologies we focused on RefSeq genes with associated gene symbols. This also simplifies functional analysis given that most genes with known function belong to this group of better annotated genes. Initial comparison between platforms of the rates of change in gene expression expressed as log2ratios using RefSeq remapped probe gene symbols as common identifiers and the median value of all probes for each gene showed a variable degree of correlation. These platforms have 17,070 RefSeq genes in common (Figure 1A). The first exploration of the data trying to find shared regulated genes, showed a strikingly low degree of overlap between the lists of most significantly regulated genes, when determined by applying an absolute fold change cut-off of 1.2 and setting a false discovery rate at 5% with significance analysis of microarrays (SAM) (Figure 1B; Additional file 2, Table S1). The reduced overlap observed is consistent with previous reports of small intersection between lists in similar experimental designs [21,26,36]. We then used gene set enrichment analysis as implemented in the GSEA tool [37] (which takes into account the entire distribution of log2ratios) to increase the power of the comparison of the results of all three platforms [36]. Our GSEA analysis showed a highly significant agreement between all three platforms, since each gene set identified by any of the three platforms was found to be asymmetrically distributed within the remaining rank ordered differential gene expression datasets (GSEA FDR q-value = 0 for all comparisons) (Figure 2; Additional file 3, Table S2). This result strongly argues in favor of all platforms being able to detect the same underlying transcriptional response behavior, while differences among individual gene measurements make it more difficult to detect these common properties when focusing only on the intersection between the top significant gene lists from the individual platforms.

Bottom Line: In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets.This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Genomic Regulation (CRG)-Universitat Pompeu Fabra (UPF), Barcelona, Spain.

ABSTRACT

Background: Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer.

Results: By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions.

Conclusions: We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data.

Show MeSH
Related in: MedlinePlus