Limits...
Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data.

Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J - BMC Genomics (2013)

Bottom Line: Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis.Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. leandro@leandrohermida.com

ABSTRACT

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Show MeSH

Related in: MedlinePlus

Screenshot of the Confero Analyze Data Tool (GSEA). Selecting the analysis algorithm “GSEA Preranked” enables to select specific parameters for GSEA in the Galaxy menu. Confero DB is automatically populated by gene sets as new contrast data or manually curated gene sets are imported into the database. Therefore, having the ability to filter/search gene sets in a specific manner is crucial to have the possibility to address specific biological questions. A filtering/searching functionality (enabling to search gene sets by organism, tissue/cells, stimulus using specific filters or free text expression) is currently available in the Analyzed Data tool and will be enhanced in future developments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750322&req=5

Figure 7: Screenshot of the Confero Analyze Data Tool (GSEA). Selecting the analysis algorithm “GSEA Preranked” enables to select specific parameters for GSEA in the Galaxy menu. Confero DB is automatically populated by gene sets as new contrast data or manually curated gene sets are imported into the database. Therefore, having the ability to filter/search gene sets in a specific manner is crucial to have the possibility to address specific biological questions. A filtering/searching functionality (enabling to search gene sets by organism, tissue/cells, stimulus using specific filters or free text expression) is currently available in the Analyzed Data tool and will be enhanced in future developments.

Mentions: To unravel biological processes/pathways regulated by estrogen at the early and late time points, GSEA was performed on the imported contrast data from the estrogen dataset. The Confero Create Ranked or DEG Lists tool was used to generate a ranked genome-wide differential expression profile for each contrast of the dataset using the moderated-t statistic data column (S) as ranking metric. With these ranked profiles (i.e. “Estro10”, “Estro48”, and “Interaction”) as input, the Confero Analyze Data tool was used to set up options and parameters to perform GSEA or ORA (Figures 3, 7 and 8). As an example of GSEA used in the context of this study case, the MSigDB C2 gene set collection from the Broad Institute was selected as a priori knowledge. Over time, Confero DB is automatically populated by gene sets as new contrast data or manually curated gene sets are imported into the database. Therefore, having the ability to filter gene sets in a specific manner is crucial to have the possibility to address specific biological questions. A filtering functionality is currently available in the Analyze Data tool and will be enhanced in future developments (Figures 3 and 7; Additional file 1: Table S1). The generated GSEA and ORA results for each contrast are directly accessible via the Galaxy user interface (Figures 9 and 10). The investigation of GSEA results through the report on the web is generally a long and tedious process to interpret the results. Indeed, researchers typically have to manually analyze the results sifting through the report for each contrast and drilling down to each gene set to access the associated leading edge genes. Therefore, to facilitate and accelerate analyzing results and biological interpretation, the Confero Extract Results Matrix tool was used to extract all related GSEA results into a tab-delimited spreadsheet file (see Additional file 7). The user has the flexibility to select which GSEA results to be extracted. The file contains normalized enrichment scores (NES), NES-associated false discovery rate (FDR) and eventually ranks at which NES is observed in the ranked gene list for all analyzed Estrogen contrast data (i.e. “Estro10”, “Estro48”, and “Interaction”). When interpreting GSEA results, it is generally important to identify which genes contribute the most to the enrichment of significant gene sets. To determine this, the Confero Extract Leading Edge Matrix tool was used to extract all leading edge genes from gene sets having FDR values below a user-defined threshold (default value of 0.05) into a single output matrix (see Additional file 8). This customizable output matrix can contain boolean values, moderated-t statistic values (see Additional file 8), or gene rank. This file provides to the biologist more granular molecular insights for interpretation by identifying genes which contribute the most to significant enrichment of observed perturbed biological processes.


Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data.

Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J - BMC Genomics (2013)

Screenshot of the Confero Analyze Data Tool (GSEA). Selecting the analysis algorithm “GSEA Preranked” enables to select specific parameters for GSEA in the Galaxy menu. Confero DB is automatically populated by gene sets as new contrast data or manually curated gene sets are imported into the database. Therefore, having the ability to filter/search gene sets in a specific manner is crucial to have the possibility to address specific biological questions. A filtering/searching functionality (enabling to search gene sets by organism, tissue/cells, stimulus using specific filters or free text expression) is currently available in the Analyzed Data tool and will be enhanced in future developments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750322&req=5

Figure 7: Screenshot of the Confero Analyze Data Tool (GSEA). Selecting the analysis algorithm “GSEA Preranked” enables to select specific parameters for GSEA in the Galaxy menu. Confero DB is automatically populated by gene sets as new contrast data or manually curated gene sets are imported into the database. Therefore, having the ability to filter/search gene sets in a specific manner is crucial to have the possibility to address specific biological questions. A filtering/searching functionality (enabling to search gene sets by organism, tissue/cells, stimulus using specific filters or free text expression) is currently available in the Analyzed Data tool and will be enhanced in future developments.
Mentions: To unravel biological processes/pathways regulated by estrogen at the early and late time points, GSEA was performed on the imported contrast data from the estrogen dataset. The Confero Create Ranked or DEG Lists tool was used to generate a ranked genome-wide differential expression profile for each contrast of the dataset using the moderated-t statistic data column (S) as ranking metric. With these ranked profiles (i.e. “Estro10”, “Estro48”, and “Interaction”) as input, the Confero Analyze Data tool was used to set up options and parameters to perform GSEA or ORA (Figures 3, 7 and 8). As an example of GSEA used in the context of this study case, the MSigDB C2 gene set collection from the Broad Institute was selected as a priori knowledge. Over time, Confero DB is automatically populated by gene sets as new contrast data or manually curated gene sets are imported into the database. Therefore, having the ability to filter gene sets in a specific manner is crucial to have the possibility to address specific biological questions. A filtering functionality is currently available in the Analyze Data tool and will be enhanced in future developments (Figures 3 and 7; Additional file 1: Table S1). The generated GSEA and ORA results for each contrast are directly accessible via the Galaxy user interface (Figures 9 and 10). The investigation of GSEA results through the report on the web is generally a long and tedious process to interpret the results. Indeed, researchers typically have to manually analyze the results sifting through the report for each contrast and drilling down to each gene set to access the associated leading edge genes. Therefore, to facilitate and accelerate analyzing results and biological interpretation, the Confero Extract Results Matrix tool was used to extract all related GSEA results into a tab-delimited spreadsheet file (see Additional file 7). The user has the flexibility to select which GSEA results to be extracted. The file contains normalized enrichment scores (NES), NES-associated false discovery rate (FDR) and eventually ranks at which NES is observed in the ranked gene list for all analyzed Estrogen contrast data (i.e. “Estro10”, “Estro48”, and “Interaction”). When interpreting GSEA results, it is generally important to identify which genes contribute the most to the enrichment of significant gene sets. To determine this, the Confero Extract Leading Edge Matrix tool was used to extract all leading edge genes from gene sets having FDR values below a user-defined threshold (default value of 0.05) into a single output matrix (see Additional file 8). This customizable output matrix can contain boolean values, moderated-t statistic values (see Additional file 8), or gene rank. This file provides to the biologist more granular molecular insights for interpretation by identifying genes which contribute the most to significant enrichment of observed perturbed biological processes.

Bottom Line: Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis.Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. leandro@leandrohermida.com

ABSTRACT

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Show MeSH
Related in: MedlinePlus