Limits...
Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data.

Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J - BMC Genomics (2013)

Bottom Line: Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis.Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. leandro@leandrohermida.com

ABSTRACT

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Show MeSH

Related in: MedlinePlus

Screenshot of the report following importing and processing of the Bioconductor estrogen contrast data in Confero using Galaxy. The report shows 1) whether contrast data have been correctly imported and processed (several checks during the mapping and collapsing process reported at the top of the document (“Check/Map/Collapse Contrast Dataset Report”), 2) information on the gene sets that have been automatically extracted from each contrast and stored (UP, DN, and AR gene sets) in the Confero DB (“Gene Set Report”). As example (see also Figure 5), 169, 105 and 274 most significantly up- , down- and all-regulated genes (FDR<0.05) were respectively extracted as gene sets UP, DN and AR from the contrast data “Estro10” (comparison of gene expression levels of estrogen vs control samples collected at 10h).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750322&req=5

Figure 4: Screenshot of the report following importing and processing of the Bioconductor estrogen contrast data in Confero using Galaxy. The report shows 1) whether contrast data have been correctly imported and processed (several checks during the mapping and collapsing process reported at the top of the document (“Check/Map/Collapse Contrast Dataset Report”), 2) information on the gene sets that have been automatically extracted from each contrast and stored (UP, DN, and AR gene sets) in the Confero DB (“Gene Set Report”). As example (see also Figure 5), 169, 105 and 274 most significantly up- , down- and all-regulated genes (FDR<0.05) were respectively extracted as gene sets UP, DN and AR from the contrast data “Estro10” (comparison of gene expression levels of estrogen vs control samples collected at 10h).

Mentions: Raw data (CEL files) were preprocessed and normalized as described in the Confero platform overview schema (Figure 2). The Bioconductor limma package was used to compute contrasts corresponding to the effect of estrogen at 10 (early) and 48 (late) hours (estrogen treatment vs. control comparison for each time point). Additionally, the contrast corresponding to the interaction effect was computed to directly investigate the differential effect of estrogen treatment at 10 and 48 hours. The output limma R object from the eBayes R function (see Additional file 6) was imported into Galaxy using the Confero Upload LIMMA/SAM R Object tool. Data for each contrast, including log2 fold changes (M) between control and estrogen treatment conditions, probeset average signal (A), moderated-t statistic (S), and associated FDR (P), were automatically extracted from the R object and converted into idMAPS format using the Confero Convert LIMMA/SAM R Object tool (see Figure 3, Additional file 2 and Additional file 1: Table S1). The idMAPS file was then imported into the Confero database using the Submit Contrast Dataset tool (Figure 3 and Additional file 1: Table S1). To note, it is also possible to directly import an already formatted idMAPS file into the Confero platform without using the Convert LIMMA/SAM R Object tool (Figure 3 and Additional file 1: Table S1). A summary report shows how the contrast dataset was processed and imported as well as information on the gene sets (UP, DN, and AR) extracted and stored for each contrast (Figure 4). In this example, 169, 105 and 274 most significantly up-, down- and all-regulated genes (FDR<0.05) were respectively extracted as gene sets UP, DN and AR from the contrast data “Estro10” (comparison of gene expression levels of estrogen vs control samples collected at 10h). Stored in Confero database, these gene sets represent gene expression perturbation fingerprints of estrogen effect on MCF7 at 10h and could be further leveraged as a priori knowledge to analyze and compare new datasets. Contrast data and gene sets derived from the Estrogen data analysis can be accessed and visualized using the View and Manage Data tool as shown in Figures 5 and 6.


Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data.

Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J - BMC Genomics (2013)

Screenshot of the report following importing and processing of the Bioconductor estrogen contrast data in Confero using Galaxy. The report shows 1) whether contrast data have been correctly imported and processed (several checks during the mapping and collapsing process reported at the top of the document (“Check/Map/Collapse Contrast Dataset Report”), 2) information on the gene sets that have been automatically extracted from each contrast and stored (UP, DN, and AR gene sets) in the Confero DB (“Gene Set Report”). As example (see also Figure 5), 169, 105 and 274 most significantly up- , down- and all-regulated genes (FDR<0.05) were respectively extracted as gene sets UP, DN and AR from the contrast data “Estro10” (comparison of gene expression levels of estrogen vs control samples collected at 10h).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750322&req=5

Figure 4: Screenshot of the report following importing and processing of the Bioconductor estrogen contrast data in Confero using Galaxy. The report shows 1) whether contrast data have been correctly imported and processed (several checks during the mapping and collapsing process reported at the top of the document (“Check/Map/Collapse Contrast Dataset Report”), 2) information on the gene sets that have been automatically extracted from each contrast and stored (UP, DN, and AR gene sets) in the Confero DB (“Gene Set Report”). As example (see also Figure 5), 169, 105 and 274 most significantly up- , down- and all-regulated genes (FDR<0.05) were respectively extracted as gene sets UP, DN and AR from the contrast data “Estro10” (comparison of gene expression levels of estrogen vs control samples collected at 10h).
Mentions: Raw data (CEL files) were preprocessed and normalized as described in the Confero platform overview schema (Figure 2). The Bioconductor limma package was used to compute contrasts corresponding to the effect of estrogen at 10 (early) and 48 (late) hours (estrogen treatment vs. control comparison for each time point). Additionally, the contrast corresponding to the interaction effect was computed to directly investigate the differential effect of estrogen treatment at 10 and 48 hours. The output limma R object from the eBayes R function (see Additional file 6) was imported into Galaxy using the Confero Upload LIMMA/SAM R Object tool. Data for each contrast, including log2 fold changes (M) between control and estrogen treatment conditions, probeset average signal (A), moderated-t statistic (S), and associated FDR (P), were automatically extracted from the R object and converted into idMAPS format using the Confero Convert LIMMA/SAM R Object tool (see Figure 3, Additional file 2 and Additional file 1: Table S1). The idMAPS file was then imported into the Confero database using the Submit Contrast Dataset tool (Figure 3 and Additional file 1: Table S1). To note, it is also possible to directly import an already formatted idMAPS file into the Confero platform without using the Convert LIMMA/SAM R Object tool (Figure 3 and Additional file 1: Table S1). A summary report shows how the contrast dataset was processed and imported as well as information on the gene sets (UP, DN, and AR) extracted and stored for each contrast (Figure 4). In this example, 169, 105 and 274 most significantly up-, down- and all-regulated genes (FDR<0.05) were respectively extracted as gene sets UP, DN and AR from the contrast data “Estro10” (comparison of gene expression levels of estrogen vs control samples collected at 10h). Stored in Confero database, these gene sets represent gene expression perturbation fingerprints of estrogen effect on MCF7 at 10h and could be further leveraged as a priori knowledge to analyze and compare new datasets. Contrast data and gene sets derived from the Estrogen data analysis can be accessed and visualized using the View and Manage Data tool as shown in Figures 5 and 6.

Bottom Line: Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis.Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. leandro@leandrohermida.com

ABSTRACT

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Show MeSH
Related in: MedlinePlus