Limits...
BigQ: a NoSQL based framework to handle genomic variants in i2b2.

Gabetta M, Limongelli I, Rizzo E, Riva A, Segagni D, Bellazzi R - BMC Bioinformatics (2015)

Bottom Line: A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations.In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data.The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

View Article: PubMed Central - PubMed

Affiliation: Dipartimento di Ingegneria Industriale e dell'Informazione and Center for Health Technologies, Università di Pavia, Pavia, Italy. matteo.gabetta@unipv.it.

ABSTRACT

Background: Precision medicine requires the tight integration of clinical and molecular data. To this end, it is mandatory to define proper technological solutions able to manage the overwhelming amount of high throughput genomic data needed to test associations between genomic signatures and human phenotypes. The i2b2 Center (Informatics for Integrating Biology and the Bedside) has developed a widely internationally adopted framework to use existing clinical data for discovery research that can help the definition of precision medicine interventions when coupled with genetic data. i2b2 can be significantly advanced by designing efficient management solutions of Next Generation Sequencing data.

Results: We developed BigQ, an extension of the i2b2 framework, which integrates patient clinical phenotypes with genomic variant profiles generated by Next Generation Sequencing. A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations. We report an evaluation of the query performance of our system on more than 11 million variants, showing that the implemented solution scales linearly in terms of query time and disk space with the number of variants.

Conclusions: In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data. The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

Show MeSH
Importing time performances and disk space occupancy on a single machine. a Time performances for annotation, JSON conversion and importing of genomic variants belonging to 10,20,50,100,200,500 whole-exome samples into CouchDB, installed on a single Amazon AWS machine. b Disk space occupancy in relation to the whole-exome data growth
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4696314&req=5

Fig5: Importing time performances and disk space occupancy on a single machine. a Time performances for annotation, JSON conversion and importing of genomic variants belonging to 10,20,50,100,200,500 whole-exome samples into CouchDB, installed on a single Amazon AWS machine. b Disk space occupancy in relation to the whole-exome data growth

Mentions: For the importing phase, BigQ-ETL has been tested on 8 Amazon EC2 [41] virtual machines, in particular, c3.2xlarge instances [42], a medium-high level server with 8 virtual CPUs and 15GB RAM. CouchDB was initially installed on a single c3.2xlarge instance. Total time to complete the ANNOVAR run, generating the JSON documents and uploading them for 500 exomes was approximately one hour and 20 min. The most computationally demanding operation in the data import process was indexing all the views in the database, with an average time of 1 min and 10 s per exome. Figure 5 reports disk space occupancy and importing time performances for annotation and indexing phase at the different dataset sizes.Fig. 5


BigQ: a NoSQL based framework to handle genomic variants in i2b2.

Gabetta M, Limongelli I, Rizzo E, Riva A, Segagni D, Bellazzi R - BMC Bioinformatics (2015)

Importing time performances and disk space occupancy on a single machine. a Time performances for annotation, JSON conversion and importing of genomic variants belonging to 10,20,50,100,200,500 whole-exome samples into CouchDB, installed on a single Amazon AWS machine. b Disk space occupancy in relation to the whole-exome data growth
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4696314&req=5

Fig5: Importing time performances and disk space occupancy on a single machine. a Time performances for annotation, JSON conversion and importing of genomic variants belonging to 10,20,50,100,200,500 whole-exome samples into CouchDB, installed on a single Amazon AWS machine. b Disk space occupancy in relation to the whole-exome data growth
Mentions: For the importing phase, BigQ-ETL has been tested on 8 Amazon EC2 [41] virtual machines, in particular, c3.2xlarge instances [42], a medium-high level server with 8 virtual CPUs and 15GB RAM. CouchDB was initially installed on a single c3.2xlarge instance. Total time to complete the ANNOVAR run, generating the JSON documents and uploading them for 500 exomes was approximately one hour and 20 min. The most computationally demanding operation in the data import process was indexing all the views in the database, with an average time of 1 min and 10 s per exome. Figure 5 reports disk space occupancy and importing time performances for annotation and indexing phase at the different dataset sizes.Fig. 5

Bottom Line: A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations.In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data.The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

View Article: PubMed Central - PubMed

Affiliation: Dipartimento di Ingegneria Industriale e dell'Informazione and Center for Health Technologies, Università di Pavia, Pavia, Italy. matteo.gabetta@unipv.it.

ABSTRACT

Background: Precision medicine requires the tight integration of clinical and molecular data. To this end, it is mandatory to define proper technological solutions able to manage the overwhelming amount of high throughput genomic data needed to test associations between genomic signatures and human phenotypes. The i2b2 Center (Informatics for Integrating Biology and the Bedside) has developed a widely internationally adopted framework to use existing clinical data for discovery research that can help the definition of precision medicine interventions when coupled with genetic data. i2b2 can be significantly advanced by designing efficient management solutions of Next Generation Sequencing data.

Results: We developed BigQ, an extension of the i2b2 framework, which integrates patient clinical phenotypes with genomic variant profiles generated by Next Generation Sequencing. A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations. We report an evaluation of the query performance of our system on more than 11 million variants, showing that the implemented solution scales linearly in terms of query time and disk space with the number of variants.

Conclusions: In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data. The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

Show MeSH