Limits...
UPDG: utilities package for data analysis of pooled DNA GWAS.

Ho DW, Yap MK, Yip SP - BMC Genet. (2012)

Bottom Line: With the input of raw intensity data from GWAS, UPDG performs the following tasks in a stepwise manner: raw data manipulation, correction for allelic preferential amplification, normalization, nested analysis of variance for genetic association testing, and summarization of analysis results.It is our vision and mission to reduce the hindrance for performing data analysis of pooled DNA GWAS through our contribution of UPDG.More importantly, we hope to promote the popularity of pooled DNA GWAS, which is a very useful research strategy.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China.

ABSTRACT

Background: Despite being a well-established strategy for cost reduction in disease gene mapping, pooled DNA association study is much less popular than the individual DNA approach. This situation is especially true for pooled DNA genomewide association study (GWAS), for which very few computer resources have been developed for its data analysis. This motivates the development of UPDG (Utilities package for data analysis of Pooled DNA GWAS).

Results: UPDG represents a generalized framework for data analysis of pooled DNA GWAS with the integration of Unix/Linux shell operations, Perl programs and R scripts. With the input of raw intensity data from GWAS, UPDG performs the following tasks in a stepwise manner: raw data manipulation, correction for allelic preferential amplification, normalization, nested analysis of variance for genetic association testing, and summarization of analysis results. Detailed instructions, procedures and commands are provided in the comprehensive user manual describing the whole process from preliminary preparation of software installation to final outcome acquisition. An example dataset (input files and sample output files) is also included in the package so that users can easily familiarize themselves with the data file formats, working procedures and expected output. Therefore, UPDG is especially useful for users with some computer knowledge, but without a sophisticated programming background.

Conclusions: UPDG provides a free, simple and platform-independent one-stop service to scientists working on pooled DNA GWAS data analysis, but with less advanced programming knowledge. It is our vision and mission to reduce the hindrance for performing data analysis of pooled DNA GWAS through our contribution of UPDG. More importantly, we hope to promote the popularity of pooled DNA GWAS, which is a very useful research strategy.

Show MeSH
Workflow of UPDG.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3293712&req=5

Figure 1: Workflow of UPDG.

Mentions: We assume here that the data files are generated from genotyping experiments performed using the Illumina platform (e.g., Human610-Quad BeadChips). Raw fluorescence signal intensities for the two alleles of markers are extracted for both individual DNA samples and pooled DNA samples. In addition to raw fluorescence signal intensities, genotype calls are also extracted for individual DNA samples. The data extraction is carried out using Unix/Linux shell operations as illustrated in the user manual of UPDG. The raw intensity and genotype data for heterozygous individuals are combined using a Perl program (merge.pl). With these extracted data, adjustment of allelic preferential amplification by various methods, normalization and filtering for data with user-specified low minor allele frequency and low completion rate are undertaken by two Perl programs (adjustment.pl and QC.pl). With adjusted and filtered data for pooled allele frequency estimates, both summarization using another Perl program (mean_Rx_statistics.pl) and nested ANOVA using R scripts (nested_ANOVA_[H/M/N/U].r) can then be performed. Nested ANOVA assesses the differences of mean pooled allele frequencies between the case group and the control group, and hence detects the association between mean pooled allele frequency estimates and a dichotomous disease phenotype. Last but not least, the Perl program output_format.pl summarizes in a user-friendly manner the unformatted nested ANOVA results obtained using different allelic preferential amplification adjustment methods, and significant markers satisfying a user-defined threshold are also extracted. The overall workflow for UPDG is illustrated in Figure 1. Apart from executing the individual UPDG components manually by typing the commands one by one, users can also execute a series of commands through the use of shell and batch scripts provided.


UPDG: utilities package for data analysis of pooled DNA GWAS.

Ho DW, Yap MK, Yip SP - BMC Genet. (2012)

Workflow of UPDG.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3293712&req=5

Figure 1: Workflow of UPDG.
Mentions: We assume here that the data files are generated from genotyping experiments performed using the Illumina platform (e.g., Human610-Quad BeadChips). Raw fluorescence signal intensities for the two alleles of markers are extracted for both individual DNA samples and pooled DNA samples. In addition to raw fluorescence signal intensities, genotype calls are also extracted for individual DNA samples. The data extraction is carried out using Unix/Linux shell operations as illustrated in the user manual of UPDG. The raw intensity and genotype data for heterozygous individuals are combined using a Perl program (merge.pl). With these extracted data, adjustment of allelic preferential amplification by various methods, normalization and filtering for data with user-specified low minor allele frequency and low completion rate are undertaken by two Perl programs (adjustment.pl and QC.pl). With adjusted and filtered data for pooled allele frequency estimates, both summarization using another Perl program (mean_Rx_statistics.pl) and nested ANOVA using R scripts (nested_ANOVA_[H/M/N/U].r) can then be performed. Nested ANOVA assesses the differences of mean pooled allele frequencies between the case group and the control group, and hence detects the association between mean pooled allele frequency estimates and a dichotomous disease phenotype. Last but not least, the Perl program output_format.pl summarizes in a user-friendly manner the unformatted nested ANOVA results obtained using different allelic preferential amplification adjustment methods, and significant markers satisfying a user-defined threshold are also extracted. The overall workflow for UPDG is illustrated in Figure 1. Apart from executing the individual UPDG components manually by typing the commands one by one, users can also execute a series of commands through the use of shell and batch scripts provided.

Bottom Line: With the input of raw intensity data from GWAS, UPDG performs the following tasks in a stepwise manner: raw data manipulation, correction for allelic preferential amplification, normalization, nested analysis of variance for genetic association testing, and summarization of analysis results.It is our vision and mission to reduce the hindrance for performing data analysis of pooled DNA GWAS through our contribution of UPDG.More importantly, we hope to promote the popularity of pooled DNA GWAS, which is a very useful research strategy.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China.

ABSTRACT

Background: Despite being a well-established strategy for cost reduction in disease gene mapping, pooled DNA association study is much less popular than the individual DNA approach. This situation is especially true for pooled DNA genomewide association study (GWAS), for which very few computer resources have been developed for its data analysis. This motivates the development of UPDG (Utilities package for data analysis of Pooled DNA GWAS).

Results: UPDG represents a generalized framework for data analysis of pooled DNA GWAS with the integration of Unix/Linux shell operations, Perl programs and R scripts. With the input of raw intensity data from GWAS, UPDG performs the following tasks in a stepwise manner: raw data manipulation, correction for allelic preferential amplification, normalization, nested analysis of variance for genetic association testing, and summarization of analysis results. Detailed instructions, procedures and commands are provided in the comprehensive user manual describing the whole process from preliminary preparation of software installation to final outcome acquisition. An example dataset (input files and sample output files) is also included in the package so that users can easily familiarize themselves with the data file formats, working procedures and expected output. Therefore, UPDG is especially useful for users with some computer knowledge, but without a sophisticated programming background.

Conclusions: UPDG provides a free, simple and platform-independent one-stop service to scientists working on pooled DNA GWAS data analysis, but with less advanced programming knowledge. It is our vision and mission to reduce the hindrance for performing data analysis of pooled DNA GWAS through our contribution of UPDG. More importantly, we hope to promote the popularity of pooled DNA GWAS, which is a very useful research strategy.

Show MeSH