Limits...
Most parsimonious haplotype allele sharing determination.

Cai Z, Sabaa H, Wang Y, Goebel R, Wang Z, Xu J, Stothard P, Lin G - BMC Bioinformatics (2009)

Bottom Line: The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants.For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees.Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada. zhipeng@cs.ualberta.ca

ABSTRACT

Background: The "common disease--common variant" hypothesis and genome-wide association studies have achieved numerous successes in the last three years, particularly in genetic mapping in human diseases. Nevertheless, the power of the association study methods are still low, in particular on quantitative traits, and the description of the full allelic spectrum is deemed still far from reach. Given increasing density of single nucleotide polymorphisms available and suggested by the block-like structure of the human genome, a popular and prosperous strategy is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination. The key to the success of this strategy is thus the ability to unambiguously determine the haplotype allele sharing status among the members. The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants.

Results: For pedigree genotype datasets of medium density of SNPs, we present two methods for haplotype allele sharing status determination among the pedigree members. Extensive simulation study showed that both methods performed nearly perfectly on breakpoint discovery, mutation haplotype allele discovery, and shared chromosomal region discovery.

Conclusion: For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees. Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.

Show MeSH
Scatter plot of the ending SNP sites of shared regions: simulated vs. discovered by i Linker on 500 simulated 10 K genotype datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2691739&req=5

Figure 3: Scatter plot of the ending SNP sites of shared regions: simulated vs. discovered by i Linker on 500 simulated 10 K genotype datasets.

Mentions: For each simulated dataset, we compared the simulated shared regions and the discovered shared regions, by i Linker and xPedPhase separately, to determine whether or not each simulated shared region was recovered by checking if they overlap or not. If a simulated shared region was not recovered, then the corresponding discovered region was set to [-1, -1]. For the 500 simulated 10 K genotype datasets, there were 725 simulated shared regions in total. 7 of them were not recovered by either xPedPhase or i Linker; 2 additional were not recovered by xPedPhase and 5 additional were not recovered by i Linker. We collected the starting SNP site and the ending SNP site for each of these simulated shared regions (x-axis) and those for the corresponding discovered shared region (y-axis) by xPedPhase and i Linker, respectively, and plotted them in Figures 1, 2, 3 and 4. Essentially, these plots show the extent to which the discovered shared regions are off the simulated shared regions. The correlation coefficients between the two sets of starting and ending sites of recovered shared regions were 0.99981 and 0.99989 by xPedPhase, and 0.99980 and 0.99981 by i Linker. On 400 50 K datasets that both i Linker and xPedPhase finished, every shared region was recovered by xPedPhase and i Linker missed only two. The correlation coefficients were 0.999993 and 0.999928 by xPedPhase, and 0.999988 and 0.999983 by i Linker, respectively (the scatter plots are not included since they are basically straight lines).


Most parsimonious haplotype allele sharing determination.

Cai Z, Sabaa H, Wang Y, Goebel R, Wang Z, Xu J, Stothard P, Lin G - BMC Bioinformatics (2009)

Scatter plot of the ending SNP sites of shared regions: simulated vs. discovered by i Linker on 500 simulated 10 K genotype datasets.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2691739&req=5

Figure 3: Scatter plot of the ending SNP sites of shared regions: simulated vs. discovered by i Linker on 500 simulated 10 K genotype datasets.
Mentions: For each simulated dataset, we compared the simulated shared regions and the discovered shared regions, by i Linker and xPedPhase separately, to determine whether or not each simulated shared region was recovered by checking if they overlap or not. If a simulated shared region was not recovered, then the corresponding discovered region was set to [-1, -1]. For the 500 simulated 10 K genotype datasets, there were 725 simulated shared regions in total. 7 of them were not recovered by either xPedPhase or i Linker; 2 additional were not recovered by xPedPhase and 5 additional were not recovered by i Linker. We collected the starting SNP site and the ending SNP site for each of these simulated shared regions (x-axis) and those for the corresponding discovered shared region (y-axis) by xPedPhase and i Linker, respectively, and plotted them in Figures 1, 2, 3 and 4. Essentially, these plots show the extent to which the discovered shared regions are off the simulated shared regions. The correlation coefficients between the two sets of starting and ending sites of recovered shared regions were 0.99981 and 0.99989 by xPedPhase, and 0.99980 and 0.99981 by i Linker. On 400 50 K datasets that both i Linker and xPedPhase finished, every shared region was recovered by xPedPhase and i Linker missed only two. The correlation coefficients were 0.999993 and 0.999928 by xPedPhase, and 0.999988 and 0.999983 by i Linker, respectively (the scatter plots are not included since they are basically straight lines).

Bottom Line: The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants.For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees.Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada. zhipeng@cs.ualberta.ca

ABSTRACT

Background: The "common disease--common variant" hypothesis and genome-wide association studies have achieved numerous successes in the last three years, particularly in genetic mapping in human diseases. Nevertheless, the power of the association study methods are still low, in particular on quantitative traits, and the description of the full allelic spectrum is deemed still far from reach. Given increasing density of single nucleotide polymorphisms available and suggested by the block-like structure of the human genome, a popular and prosperous strategy is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination. The key to the success of this strategy is thus the ability to unambiguously determine the haplotype allele sharing status among the members. The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants.

Results: For pedigree genotype datasets of medium density of SNPs, we present two methods for haplotype allele sharing status determination among the pedigree members. Extensive simulation study showed that both methods performed nearly perfectly on breakpoint discovery, mutation haplotype allele discovery, and shared chromosomal region discovery.

Conclusion: For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees. Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.

Show MeSH