Limits...
openBIS: a flexible framework for managing and analyzing complex data in biology research.

Bauch A, Adamczyk I, Buczek P, Elmer FJ, Enimanev K, Glyzewski P, Kohler M, Pylak T, Quandt A, Ramakrishnan C, Beisel C, Malmström L, Aebersold R, Rinn B - BMC Bioinformatics (2011)

Bottom Line: Ease of integration with data analysis pipelines and other computational tools is a key requirement for it.This framework can be extended and has been customized for different data types acquired by a range of technologies. openBIS is currently being used by several SystemsX.ch and EU projects applying mass spectrometric measurements of metabolites and proteins, High Content Screening, or Next Generation Sequencing technologies.The attributes that make it interesting to a large research community involved in systems biology projects include versatility, simplicity in deployment, scalability to very large data, flexibility to handle any biological data type and extensibility to the needs of any research domain.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biosystems Science and Engineering, Center for Information Sciences and Databases, Swiss Federal Institute of Technology (ETH) Zurich, Switzerland.

ABSTRACT

Background: Modern data generation techniques used in distributed systems biology research projects often create datasets of enormous size and diversity. We argue that in order to overcome the challenge of managing those large quantitative datasets and maximise the biological information extracted from them, a sound information system is required. Ease of integration with data analysis pipelines and other computational tools is a key requirement for it.

Results: We have developed openBIS, an open source software framework for constructing user-friendly, scalable and powerful information systems for data and metadata acquired in biological experiments. openBIS enables users to collect, integrate, share, publish data and to connect to data processing pipelines. This framework can be extended and has been customized for different data types acquired by a range of technologies.

Conclusions: openBIS is currently being used by several SystemsX.ch and EU projects applying mass spectrometric measurements of metabolites and proteins, High Content Screening, or Next Generation Sequencing technologies. The attributes that make it interesting to a large research community involved in systems biology projects include versatility, simplicity in deployment, scalability to very large data, flexibility to handle any biological data type and extensibility to the needs of any research domain.

Show MeSH
Data Browsing and Visualization. A Data Set Browsing. Different datasets generated from the original sequencing data can be browsed. Here five different dataset types are attached to one sequencing sample. B Nucleotide distribution plot. Barplot showing the distribution of nucleotides per sequencing cycle. C Quality plot. Histogram showing the Phred quality scores distribution per sequencing cycle. D Data. Aligned Data stored in SAM/BAM file format. E For visualization of the ChIP-seq data the Wig file can be exported to the UCSC Genome Browser [8]. The screenshot shows a typical binding profile of the PcG protein Polyhomeotic (Ph) in the homeotic cluster BX-C.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3275639&req=5

Figure 5: Data Browsing and Visualization. A Data Set Browsing. Different datasets generated from the original sequencing data can be browsed. Here five different dataset types are attached to one sequencing sample. B Nucleotide distribution plot. Barplot showing the distribution of nucleotides per sequencing cycle. C Quality plot. Histogram showing the Phred quality scores distribution per sequencing cycle. D Data. Aligned Data stored in SAM/BAM file format. E For visualization of the ChIP-seq data the Wig file can be exported to the UCSC Genome Browser [8]. The screenshot shows a typical binding profile of the PcG protein Polyhomeotic (Ph) in the homeotic cluster BX-C.

Mentions: The sequencing data of one flow lane sample can consist of several datasets (Figure 5). These datasets are not fixed and can be defined by the openBIS instance admin. In Figure 5A five different datasets are shown: SRF, FASTQ_GZ, QUALITY_CHECK, ALIGNMENT and WIGGLE. Each of these datasets holds files and folder. By clicking on the datasets one can download the data or in case of pdfs or images like in Figure 5B and 5C the files are shown within a frame, depending on the used browser. Both images mentioned before are descriptive for the produced sequencing data and can give first hints on problems about the run. Figure 5E is not viewable in openBIS directly. The created Wig file needs to be uploaded to the UCSC Genome Browser [36] and viewed therein. Although it would be extremely efficient to use a bigWig file and simply reference from the UCSC Genome Browser to this file stored in openBIS, offering a secure access via https, username and password is currently not supported by the UCSC Genome Browser.


openBIS: a flexible framework for managing and analyzing complex data in biology research.

Bauch A, Adamczyk I, Buczek P, Elmer FJ, Enimanev K, Glyzewski P, Kohler M, Pylak T, Quandt A, Ramakrishnan C, Beisel C, Malmström L, Aebersold R, Rinn B - BMC Bioinformatics (2011)

Data Browsing and Visualization. A Data Set Browsing. Different datasets generated from the original sequencing data can be browsed. Here five different dataset types are attached to one sequencing sample. B Nucleotide distribution plot. Barplot showing the distribution of nucleotides per sequencing cycle. C Quality plot. Histogram showing the Phred quality scores distribution per sequencing cycle. D Data. Aligned Data stored in SAM/BAM file format. E For visualization of the ChIP-seq data the Wig file can be exported to the UCSC Genome Browser [8]. The screenshot shows a typical binding profile of the PcG protein Polyhomeotic (Ph) in the homeotic cluster BX-C.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3275639&req=5

Figure 5: Data Browsing and Visualization. A Data Set Browsing. Different datasets generated from the original sequencing data can be browsed. Here five different dataset types are attached to one sequencing sample. B Nucleotide distribution plot. Barplot showing the distribution of nucleotides per sequencing cycle. C Quality plot. Histogram showing the Phred quality scores distribution per sequencing cycle. D Data. Aligned Data stored in SAM/BAM file format. E For visualization of the ChIP-seq data the Wig file can be exported to the UCSC Genome Browser [8]. The screenshot shows a typical binding profile of the PcG protein Polyhomeotic (Ph) in the homeotic cluster BX-C.
Mentions: The sequencing data of one flow lane sample can consist of several datasets (Figure 5). These datasets are not fixed and can be defined by the openBIS instance admin. In Figure 5A five different datasets are shown: SRF, FASTQ_GZ, QUALITY_CHECK, ALIGNMENT and WIGGLE. Each of these datasets holds files and folder. By clicking on the datasets one can download the data or in case of pdfs or images like in Figure 5B and 5C the files are shown within a frame, depending on the used browser. Both images mentioned before are descriptive for the produced sequencing data and can give first hints on problems about the run. Figure 5E is not viewable in openBIS directly. The created Wig file needs to be uploaded to the UCSC Genome Browser [36] and viewed therein. Although it would be extremely efficient to use a bigWig file and simply reference from the UCSC Genome Browser to this file stored in openBIS, offering a secure access via https, username and password is currently not supported by the UCSC Genome Browser.

Bottom Line: Ease of integration with data analysis pipelines and other computational tools is a key requirement for it.This framework can be extended and has been customized for different data types acquired by a range of technologies. openBIS is currently being used by several SystemsX.ch and EU projects applying mass spectrometric measurements of metabolites and proteins, High Content Screening, or Next Generation Sequencing technologies.The attributes that make it interesting to a large research community involved in systems biology projects include versatility, simplicity in deployment, scalability to very large data, flexibility to handle any biological data type and extensibility to the needs of any research domain.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biosystems Science and Engineering, Center for Information Sciences and Databases, Swiss Federal Institute of Technology (ETH) Zurich, Switzerland.

ABSTRACT

Background: Modern data generation techniques used in distributed systems biology research projects often create datasets of enormous size and diversity. We argue that in order to overcome the challenge of managing those large quantitative datasets and maximise the biological information extracted from them, a sound information system is required. Ease of integration with data analysis pipelines and other computational tools is a key requirement for it.

Results: We have developed openBIS, an open source software framework for constructing user-friendly, scalable and powerful information systems for data and metadata acquired in biological experiments. openBIS enables users to collect, integrate, share, publish data and to connect to data processing pipelines. This framework can be extended and has been customized for different data types acquired by a range of technologies.

Conclusions: openBIS is currently being used by several SystemsX.ch and EU projects applying mass spectrometric measurements of metabolites and proteins, High Content Screening, or Next Generation Sequencing technologies. The attributes that make it interesting to a large research community involved in systems biology projects include versatility, simplicity in deployment, scalability to very large data, flexibility to handle any biological data type and extensibility to the needs of any research domain.

Show MeSH