Limits...
A comparison of gene region simulation methods.

Hendricks AE, Dupuis J, Gupta M, Logue MW, Lunetta KL - PLoS ONE (2012)

Bottom Line: When possible, we also evaluate the effects of changing parameters.We recommend using Hapgen to simulate replicate haplotypes from a gene region.Hapgen produces moderate sampling variation between the replicates while retaining the overall unique LD structure of the gene region.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America. baera@bu.edu

ABSTRACT

Background: Accurately modeling LD in simulations is essential to correctly evaluate new and existing association methods. At present, there has been minimal research comparing the quality of existing gene region simulation methods to produce LD structures similar to an existing gene region. Here we compare the ability of three approaches to accurately simulate the LD within a gene region: HapSim (2005), Hapgen (2009), and a minor extension to simple haplotype resampling.

Methodology/principal findings: In order to observe the variation and bias for each method, we compare the simulated pairwise LD measures and minor allele frequencies to the original HapMap data in an extensive simulation study. When possible, we also evaluate the effects of changing parameters. HapSim produces samples of haplotypes with lower LD, on average, compared to the original haplotype set while both our resampling method and Hapgen do not introduce this bias. The variation introduced across the replicates by our resampling method is quite small and may not provide enough sampling variability to make a generalizable simulation study.

Conclusion: We recommend using Hapgen to simulate replicate haplotypes from a gene region. Hapgen produces moderate sampling variation between the replicates while retaining the overall unique LD structure of the gene region.

Show MeSH

Related in: MedlinePlus

Heat maps of change in median LD for Gene Region 1.Heat maps of change in median simulated LD from original LD in Gene Region 1 (median[LDsimulated] – LDHapMap). Upper left D’, lower right r2. Blue indicates a gain in LD; red indicates a loss in LD.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3399793&req=5

pone-0040925-g004: Heat maps of change in median LD for Gene Region 1.Heat maps of change in median simulated LD from original LD in Gene Region 1 (median[LDsimulated] – LDHapMap). Upper left D’, lower right r2. Blue indicates a gain in LD; red indicates a loss in LD.

Mentions: Even more striking and important, resampling and Hapgen produced little to no bias whereas HapSim appeared to produce a loss in LD across both gene regions as shown in Figures 2, 3, 4, and 5. For Gene Region 1, HapSim had a value for bias below 0 (median D’ = −0.106 & median r2 = −0.006, mean D’ = −0.161 & median r2 = −0.046) indicating an average loss in LD (Table 4). HapSim’s loss in LD was less extreme for Gene Region 2 (median D’ <0.001 & median r2<0.001, mean D’ = −0.050 & median r2 = −0.005) where there was a lower starting LD and thus less to lose (Table 5). Both Hapgen and resampling had bias values close to or at 0 indicating little to no bias. HapSim’s loss of LD was seen across all MAFs, but was limited mostly to moderate to high LD pairs (D’ >0.2 or r2>0.2) (Figures S6, S7, S8, and S9). This is even more apparent in Gene Region 2 where the loss in LD produced by HapSim was limited exclusively to moderate to high LD pairs. Nonetheless, the bias towards a loss in LD (especially r2) for the moderate to high LD groups in Gene Region 2 for HapSim was quite extreme.


A comparison of gene region simulation methods.

Hendricks AE, Dupuis J, Gupta M, Logue MW, Lunetta KL - PLoS ONE (2012)

Heat maps of change in median LD for Gene Region 1.Heat maps of change in median simulated LD from original LD in Gene Region 1 (median[LDsimulated] – LDHapMap). Upper left D’, lower right r2. Blue indicates a gain in LD; red indicates a loss in LD.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3399793&req=5

pone-0040925-g004: Heat maps of change in median LD for Gene Region 1.Heat maps of change in median simulated LD from original LD in Gene Region 1 (median[LDsimulated] – LDHapMap). Upper left D’, lower right r2. Blue indicates a gain in LD; red indicates a loss in LD.
Mentions: Even more striking and important, resampling and Hapgen produced little to no bias whereas HapSim appeared to produce a loss in LD across both gene regions as shown in Figures 2, 3, 4, and 5. For Gene Region 1, HapSim had a value for bias below 0 (median D’ = −0.106 & median r2 = −0.006, mean D’ = −0.161 & median r2 = −0.046) indicating an average loss in LD (Table 4). HapSim’s loss in LD was less extreme for Gene Region 2 (median D’ <0.001 & median r2<0.001, mean D’ = −0.050 & median r2 = −0.005) where there was a lower starting LD and thus less to lose (Table 5). Both Hapgen and resampling had bias values close to or at 0 indicating little to no bias. HapSim’s loss of LD was seen across all MAFs, but was limited mostly to moderate to high LD pairs (D’ >0.2 or r2>0.2) (Figures S6, S7, S8, and S9). This is even more apparent in Gene Region 2 where the loss in LD produced by HapSim was limited exclusively to moderate to high LD pairs. Nonetheless, the bias towards a loss in LD (especially r2) for the moderate to high LD groups in Gene Region 2 for HapSim was quite extreme.

Bottom Line: When possible, we also evaluate the effects of changing parameters.We recommend using Hapgen to simulate replicate haplotypes from a gene region.Hapgen produces moderate sampling variation between the replicates while retaining the overall unique LD structure of the gene region.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America. baera@bu.edu

ABSTRACT

Background: Accurately modeling LD in simulations is essential to correctly evaluate new and existing association methods. At present, there has been minimal research comparing the quality of existing gene region simulation methods to produce LD structures similar to an existing gene region. Here we compare the ability of three approaches to accurately simulate the LD within a gene region: HapSim (2005), Hapgen (2009), and a minor extension to simple haplotype resampling.

Methodology/principal findings: In order to observe the variation and bias for each method, we compare the simulated pairwise LD measures and minor allele frequencies to the original HapMap data in an extensive simulation study. When possible, we also evaluate the effects of changing parameters. HapSim produces samples of haplotypes with lower LD, on average, compared to the original haplotype set while both our resampling method and Hapgen do not introduce this bias. The variation introduced across the replicates by our resampling method is quite small and may not provide enough sampling variability to make a generalizable simulation study.

Conclusion: We recommend using Hapgen to simulate replicate haplotypes from a gene region. Hapgen produces moderate sampling variation between the replicates while retaining the overall unique LD structure of the gene region.

Show MeSH
Related in: MedlinePlus