Limits...
SNPpy--database management for SNP data from genome wide association studies.

Mitha F, Herodotou H, Borisov N, Jiang C, Yoder J, Owzar K - PLoS ONE (2011)

Bottom Line: To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats.It does low level and flexible data validation, including validation of patient data.SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America. faheem@faheem.info

ABSTRACT

Background: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software.

Results: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses.

Conclusions: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

Show MeSH
Database Schema.Geno Single database schema for the Affymetrix platform. In this diagram, the rectangles correspond to database tables, and the rows in each rectangle correspond to database table columns. The four columns in a row correspond to, from left to right, database name (column 1), data type (column 2), primary key indicator (column 3), and foreign key indicator (column 4). The arrows correspond to foreign keys. Observe the number of arrows leaving a table is equal to the number of columns that are foreign keys in that table.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3198468&req=5

pone-0024982-g002: Database Schema.Geno Single database schema for the Affymetrix platform. In this diagram, the rectangles correspond to database tables, and the rows in each rectangle correspond to database table columns. The four columns in a row correspond to, from left to right, database name (column 1), data type (column 2), primary key indicator (column 3), and foreign key indicator (column 4). The arrows correspond to foreign keys. Observe the number of arrows leaving a table is equal to the number of columns that are foreign keys in that table.

Mentions: The heart of SNPpy is the database schema illustrated in Figure 2. In addition to the schema, we have developed two classes of Python scripts: (i) input scripts for parsing and loading the database tables, and (ii) output scripts for processing and exporting the data into different downstream formats, using SQL queries. The input scripts are written using object-oriented Python, with classes corresponding to the different platforms. Currently, the system can produce PED/MAP and TPED/TFAM data formats for individual datasets as well as the merger of multiple datasets. The latter is useful, for example, for doing quality control with HapMap data. A diagrammatic representation of the overall workflow is shown in Figure 1.


SNPpy--database management for SNP data from genome wide association studies.

Mitha F, Herodotou H, Borisov N, Jiang C, Yoder J, Owzar K - PLoS ONE (2011)

Database Schema.Geno Single database schema for the Affymetrix platform. In this diagram, the rectangles correspond to database tables, and the rows in each rectangle correspond to database table columns. The four columns in a row correspond to, from left to right, database name (column 1), data type (column 2), primary key indicator (column 3), and foreign key indicator (column 4). The arrows correspond to foreign keys. Observe the number of arrows leaving a table is equal to the number of columns that are foreign keys in that table.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3198468&req=5

pone-0024982-g002: Database Schema.Geno Single database schema for the Affymetrix platform. In this diagram, the rectangles correspond to database tables, and the rows in each rectangle correspond to database table columns. The four columns in a row correspond to, from left to right, database name (column 1), data type (column 2), primary key indicator (column 3), and foreign key indicator (column 4). The arrows correspond to foreign keys. Observe the number of arrows leaving a table is equal to the number of columns that are foreign keys in that table.
Mentions: The heart of SNPpy is the database schema illustrated in Figure 2. In addition to the schema, we have developed two classes of Python scripts: (i) input scripts for parsing and loading the database tables, and (ii) output scripts for processing and exporting the data into different downstream formats, using SQL queries. The input scripts are written using object-oriented Python, with classes corresponding to the different platforms. Currently, the system can produce PED/MAP and TPED/TFAM data formats for individual datasets as well as the merger of multiple datasets. The latter is useful, for example, for doing quality control with HapMap data. A diagrammatic representation of the overall workflow is shown in Figure 1.

Bottom Line: To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats.It does low level and flexible data validation, including validation of patient data.SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America. faheem@faheem.info

ABSTRACT

Background: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software.

Results: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses.

Conclusions: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

Show MeSH