Limits...
Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data.

Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J - BMC Genomics (2013)

Bottom Line: Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis.Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. leandro@leandrohermida.com

ABSTRACT

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Show MeSH

Related in: MedlinePlus

Summary of Confero Bioconductor estrogen dataset GSEA results and leveraging of leading edge genes results from Export Leading Edge Matrix tool. Grouping gene sets per biological processes and investigating the leading edge genes associated to significantly enriched gene sets enables to rapidly interpret biological events at the molecular level and raise new hypothesis that could further be experimentally verified. Red and green colors highlight normalized enrichment scores that are significantly enriched for up- or down-regulated genes, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750322&req=5

Figure 11: Summary of Confero Bioconductor estrogen dataset GSEA results and leveraging of leading edge genes results from Export Leading Edge Matrix tool. Grouping gene sets per biological processes and investigating the leading edge genes associated to significantly enriched gene sets enables to rapidly interpret biological events at the molecular level and raise new hypothesis that could further be experimentally verified. Red and green colors highlight normalized enrichment scores that are significantly enriched for up- or down-regulated genes, respectively.

Mentions: Overall, the result files generated by the Confero platform tools enable more rapid and efficient biological interpretation of data. Indeed, the GSEA results matrix can be directly leveraged to identify significant gene sets per contrast and also search for enrichment patterns across contrasts similarly to FigureĀ 11. Grouping gene sets per biological processes and investigating the leading edge genes associated to significantly enriched gene sets enables to rapidly interpret biological events at the molecular level and raise new hypothesis that could further be experimentally verified. As shown in 11, the results highlight that processes corresponding mainly to cell cycle and metabolism were activated in MCF7 cells exposed to estrogen. The pattern of gene set enrichment over time seemed to indicate that the proportion of MCF7 cells in different phases of the cell cycle diverged at early and late time points. Indeed, enrichment of genes representative of the G1 and S-phases were more important at 10 hours, whereas enrichment of genes involved in G2 and M-phases was predominant at 48 hours. Therefore, it was possible to follow the enrichment profile over time for genes implicated in processes coupled to growth and division of cells: activation of protein synthesis machinery, lipid and sugar metabolism to provide energy to the cell, nucleotide metabolism required for DNA replication, amino acid metabolism for protein synthesis, and decrease of cell-cell and extracellular interaction as well as cytoskeleton function, which is a phenomenon characteristic of cells under proliferation. Similar observations have been reported by other studies investigating gene expression profiles of MCF7 exposed to estradiol [29,30]. Only at the later time point (48 hours), genes involved in oxidative phosphorylation and TCA cycle were highly enriched. This observation might suggest that either cell mitosis is accompanied by mitochondria biogenesis [31], or that estrogen regulates the transcription of those genes. Independent studies seem to support the latter hypothesis. Indeed, estradiol has been shown to enhance the transcript levels of mitochondrial genome-encoded genes in several cell types such as MCF7 [32,33]. In hepatocytes, this effect was accompanied by an increase of the mitochondrial respiratory chain activity [33].


Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data.

Hermida L, Poussin C, Stadler MB, Gubian S, Sewer A, Gaidatzis D, Hotz HR, Martin F, Belcastro V, Cano S, Peitsch MC, Hoeng J - BMC Genomics (2013)

Summary of Confero Bioconductor estrogen dataset GSEA results and leveraging of leading edge genes results from Export Leading Edge Matrix tool. Grouping gene sets per biological processes and investigating the leading edge genes associated to significantly enriched gene sets enables to rapidly interpret biological events at the molecular level and raise new hypothesis that could further be experimentally verified. Red and green colors highlight normalized enrichment scores that are significantly enriched for up- or down-regulated genes, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750322&req=5

Figure 11: Summary of Confero Bioconductor estrogen dataset GSEA results and leveraging of leading edge genes results from Export Leading Edge Matrix tool. Grouping gene sets per biological processes and investigating the leading edge genes associated to significantly enriched gene sets enables to rapidly interpret biological events at the molecular level and raise new hypothesis that could further be experimentally verified. Red and green colors highlight normalized enrichment scores that are significantly enriched for up- or down-regulated genes, respectively.
Mentions: Overall, the result files generated by the Confero platform tools enable more rapid and efficient biological interpretation of data. Indeed, the GSEA results matrix can be directly leveraged to identify significant gene sets per contrast and also search for enrichment patterns across contrasts similarly to FigureĀ 11. Grouping gene sets per biological processes and investigating the leading edge genes associated to significantly enriched gene sets enables to rapidly interpret biological events at the molecular level and raise new hypothesis that could further be experimentally verified. As shown in 11, the results highlight that processes corresponding mainly to cell cycle and metabolism were activated in MCF7 cells exposed to estrogen. The pattern of gene set enrichment over time seemed to indicate that the proportion of MCF7 cells in different phases of the cell cycle diverged at early and late time points. Indeed, enrichment of genes representative of the G1 and S-phases were more important at 10 hours, whereas enrichment of genes involved in G2 and M-phases was predominant at 48 hours. Therefore, it was possible to follow the enrichment profile over time for genes implicated in processes coupled to growth and division of cells: activation of protein synthesis machinery, lipid and sugar metabolism to provide energy to the cell, nucleotide metabolism required for DNA replication, amino acid metabolism for protein synthesis, and decrease of cell-cell and extracellular interaction as well as cytoskeleton function, which is a phenomenon characteristic of cells under proliferation. Similar observations have been reported by other studies investigating gene expression profiles of MCF7 exposed to estradiol [29,30]. Only at the later time point (48 hours), genes involved in oxidative phosphorylation and TCA cycle were highly enriched. This observation might suggest that either cell mitosis is accompanied by mitochondria biogenesis [31], or that estrogen regulates the transcription of those genes. Independent studies seem to support the latter hypothesis. Indeed, estradiol has been shown to enhance the transcript levels of mitochondrial genome-encoded genes in several cell types such as MCF7 [32,33]. In hepatocytes, this effect was accompanied by an increase of the mitochondrial respiratory chain activity [33].

Bottom Line: Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis.Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

View Article: PubMed Central - HTML - PubMed

Affiliation: Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. leandro@leandrohermida.com

ABSTRACT

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Show MeSH
Related in: MedlinePlus