Limits...
glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data.

Hutchins AP, Jauch R, Dyla M, Miranda-Saavedra D - Cell Regen (Lond) (2014)

Bottom Line: Genomic datasets and the tools to analyze them have proliferated at an astonishing rate.Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files.Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530 China.

ABSTRACT
Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data), and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

No MeSH data available.


A schematic overview of the functions included in glbase. glbase accepts files in a variety of formats, brings them into a Python environment as ‘genelist’ objects which behave like a Python list of key:value pairs. Data can be manipulated within glbase using a variety of built-in functions, and subsequently output in specific formats or graphically for the visual interpretation of (combined) datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4230833&req=5

Fig1: A schematic overview of the functions included in glbase. glbase accepts files in a variety of formats, brings them into a Python environment as ‘genelist’ objects which behave like a Python list of key:value pairs. Data can be manipulated within glbase using a variety of built-in functions, and subsequently output in specific formats or graphically for the visual interpretation of (combined) datasets.

Mentions: glbase is a project designed to complement the above tools for the analysis of genomic data. Using the advantages of the Python programming language glbase aims to directly translate biological questions into Python code. To assist in that glbase deals with several problems. Firstly it acts as an intermediary between tools. Secondly it provides a relatively compact programming syntax. Thirdly it incorporates many common analytical methods to integrate data. Finally, glbase provides tools for the graphical output of data analyses. glbase deals with the problem of incompatible file formats between different tools not by suggesting a top-down standardization of file formats, but instead by providing a simple means to describe diverse file formats and load them into a Python programming environment. Additionally, glbase facilitates the down-stream processing of the data as it includes a suite of common analysis tools, such as heatmaps and sequence read pileups. glbase has been designed to interact more generally with other Python tools, such as statistics with SciPy and graphical outputs with matplotlib, and data can also be exported into other file formats for analysis in yet further tools or imported into R. In this way glbase acts as the ‘glue’ between up-stream analysis (e.g. the genomic alignment of sequencing reads and ChIP-seq peak discovery) and down-stream analysis (e.g. ChIP-seq peak annotation, combining ChIP-seq/RNA-seq data, and the production of publication-quality figures). glbase is implemented as a Python module designed to be used non-interactively to write short scripts to achieve specific aims, leaving a permanent record of the user’s processes, thus documenting the data analysis process to make it repeatable. Furthermore, glbase incorporates methods to overlap and annotate genomic intervals (similar to BEDTools [2]), to map common values across two lists (similar to but more powerful than the UNIX command ‘join’), support for genomic coordinates to gene annotations and for extracting sequence data from FASTA files. Also included in glbase is a selection of analysis tools to produce a variety of graphical summaries of data, including heatmaps, scatter plots, pie charts and histograms of genomic and expression data. Finally, glbase features a flexible and efficient SQL implementation for storing genomic-scale data, such as high-throughput sequence reads or phastCons evolutionary scores [12], which allow the efficient random-access retrieval of numerical or sequence reads from within millions of sequencing tags. Figure 1 gives a schematic overview of the functions available in glbase. glbase is especially suited to the analysis of next generation sequencing and genome-wide data, particularly ChIP-seq, RNA-seq and microarray expression data.Figure 1


glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data.

Hutchins AP, Jauch R, Dyla M, Miranda-Saavedra D - Cell Regen (Lond) (2014)

A schematic overview of the functions included in glbase. glbase accepts files in a variety of formats, brings them into a Python environment as ‘genelist’ objects which behave like a Python list of key:value pairs. Data can be manipulated within glbase using a variety of built-in functions, and subsequently output in specific formats or graphically for the visual interpretation of (combined) datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4230833&req=5

Fig1: A schematic overview of the functions included in glbase. glbase accepts files in a variety of formats, brings them into a Python environment as ‘genelist’ objects which behave like a Python list of key:value pairs. Data can be manipulated within glbase using a variety of built-in functions, and subsequently output in specific formats or graphically for the visual interpretation of (combined) datasets.
Mentions: glbase is a project designed to complement the above tools for the analysis of genomic data. Using the advantages of the Python programming language glbase aims to directly translate biological questions into Python code. To assist in that glbase deals with several problems. Firstly it acts as an intermediary between tools. Secondly it provides a relatively compact programming syntax. Thirdly it incorporates many common analytical methods to integrate data. Finally, glbase provides tools for the graphical output of data analyses. glbase deals with the problem of incompatible file formats between different tools not by suggesting a top-down standardization of file formats, but instead by providing a simple means to describe diverse file formats and load them into a Python programming environment. Additionally, glbase facilitates the down-stream processing of the data as it includes a suite of common analysis tools, such as heatmaps and sequence read pileups. glbase has been designed to interact more generally with other Python tools, such as statistics with SciPy and graphical outputs with matplotlib, and data can also be exported into other file formats for analysis in yet further tools or imported into R. In this way glbase acts as the ‘glue’ between up-stream analysis (e.g. the genomic alignment of sequencing reads and ChIP-seq peak discovery) and down-stream analysis (e.g. ChIP-seq peak annotation, combining ChIP-seq/RNA-seq data, and the production of publication-quality figures). glbase is implemented as a Python module designed to be used non-interactively to write short scripts to achieve specific aims, leaving a permanent record of the user’s processes, thus documenting the data analysis process to make it repeatable. Furthermore, glbase incorporates methods to overlap and annotate genomic intervals (similar to BEDTools [2]), to map common values across two lists (similar to but more powerful than the UNIX command ‘join’), support for genomic coordinates to gene annotations and for extracting sequence data from FASTA files. Also included in glbase is a selection of analysis tools to produce a variety of graphical summaries of data, including heatmaps, scatter plots, pie charts and histograms of genomic and expression data. Finally, glbase features a flexible and efficient SQL implementation for storing genomic-scale data, such as high-throughput sequence reads or phastCons evolutionary scores [12], which allow the efficient random-access retrieval of numerical or sequence reads from within millions of sequencing tags. Figure 1 gives a schematic overview of the functions available in glbase. glbase is especially suited to the analysis of next generation sequencing and genome-wide data, particularly ChIP-seq, RNA-seq and microarray expression data.Figure 1

Bottom Line: Genomic datasets and the tools to analyze them have proliferated at an astonishing rate.Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files.Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, 510530 China.

ABSTRACT
Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data), and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

No MeSH data available.