Limits...
Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits

View Article: PubMed Central - PubMed

ABSTRACT

We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064–339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT < 5 × 10−8. The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species.

No MeSH data available.


Power of the LD-pruned fastBAT as a function of r2 threshold used for LD pruning.Shown are the results from simulations based on WGS data from the UK10K project (n = 3,781) under two scenarios, i.e. causal variants clustered or randomly distributed (Methods). The LD-pruned fastBAT analysis is performed at a range of thresholds for LD pruning (shown on x-axis) using common SNPs (MAF ≥ 0.01) on three different panels, i.e. all sequence variants, SNPs on HapMap2 and SNPs on HapMap3. The power is measured by mean χ21value of all genes (panel a) or genes harboring at least one of the simulated causal variants (panel b), where χ21 is calculated from PfastBAT. Each plotted value is an average from 500 simulations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5015118&req=5

f2: Power of the LD-pruned fastBAT as a function of r2 threshold used for LD pruning.Shown are the results from simulations based on WGS data from the UK10K project (n = 3,781) under two scenarios, i.e. causal variants clustered or randomly distributed (Methods). The LD-pruned fastBAT analysis is performed at a range of thresholds for LD pruning (shown on x-axis) using common SNPs (MAF ≥ 0.01) on three different panels, i.e. all sequence variants, SNPs on HapMap2 and SNPs on HapMap3. The power is measured by mean χ21value of all genes (panel a) or genes harboring at least one of the simulated causal variants (panel b), where χ21 is calculated from PfastBAT. Each plotted value is an average from 500 simulations.

Mentions: Previous study suggests that the set-based association analysis approaches such as that implemented in PLINK lose power if there are SNPs in extremely high LD in the set14. We found in simulations (Methods) that a set-based approach gained power if there were SNPs in perfect LD with the causal variants, and lost power if there were SNPs in perfect LD with markers (Supplementary Fig. 3), where markers are defined as SNPs that are independent from the causal variants. These results suggest that power can be gained by pruning SNPs that are in extremely high LD (e.g. LD r2 > 0.9) in particular if the causal variants tend to be enriched in genomic regions with lower LD15. We therefore developed a LD-pruned fastBAT method (Methods). We demonstrate using simulations (Methods) that the LD-pruned (e.g. using a LD r2 threshold of 0.9 or 0.99) fastBAT method is slightly more powerful than the original fastBAT method in two different simulation scenarios (causal variants were either randomly distributed or clustered in small regions) (Supplementary Table 1). We re-ran the fastBAT-pruning analysis with a range of threshold r2 values and found that the LD-pruned fastBAT achieved the largest power gain at r2 threshold from approximately 0.9 to 0.99 depending on the SNP panel (HapMap2, HapMap3 or whole genome sequencing) and genetic architecture of the trait (Fig. 2). In practice, we recommend a threshold r2 value of 0.9 regardless of SNP panel, and do not recommend a threshold r2 < 0.7. In addition, we did not observe any inflation in −log10(p-value) for the LD-pruned fastBAT method under the hypothesis that there was no genetic effect (Supplementary Fig. 4).


Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits
Power of the LD-pruned fastBAT as a function of r2 threshold used for LD pruning.Shown are the results from simulations based on WGS data from the UK10K project (n = 3,781) under two scenarios, i.e. causal variants clustered or randomly distributed (Methods). The LD-pruned fastBAT analysis is performed at a range of thresholds for LD pruning (shown on x-axis) using common SNPs (MAF ≥ 0.01) on three different panels, i.e. all sequence variants, SNPs on HapMap2 and SNPs on HapMap3. The power is measured by mean χ21value of all genes (panel a) or genes harboring at least one of the simulated causal variants (panel b), where χ21 is calculated from PfastBAT. Each plotted value is an average from 500 simulations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5015118&req=5

f2: Power of the LD-pruned fastBAT as a function of r2 threshold used for LD pruning.Shown are the results from simulations based on WGS data from the UK10K project (n = 3,781) under two scenarios, i.e. causal variants clustered or randomly distributed (Methods). The LD-pruned fastBAT analysis is performed at a range of thresholds for LD pruning (shown on x-axis) using common SNPs (MAF ≥ 0.01) on three different panels, i.e. all sequence variants, SNPs on HapMap2 and SNPs on HapMap3. The power is measured by mean χ21value of all genes (panel a) or genes harboring at least one of the simulated causal variants (panel b), where χ21 is calculated from PfastBAT. Each plotted value is an average from 500 simulations.
Mentions: Previous study suggests that the set-based association analysis approaches such as that implemented in PLINK lose power if there are SNPs in extremely high LD in the set14. We found in simulations (Methods) that a set-based approach gained power if there were SNPs in perfect LD with the causal variants, and lost power if there were SNPs in perfect LD with markers (Supplementary Fig. 3), where markers are defined as SNPs that are independent from the causal variants. These results suggest that power can be gained by pruning SNPs that are in extremely high LD (e.g. LD r2 > 0.9) in particular if the causal variants tend to be enriched in genomic regions with lower LD15. We therefore developed a LD-pruned fastBAT method (Methods). We demonstrate using simulations (Methods) that the LD-pruned (e.g. using a LD r2 threshold of 0.9 or 0.99) fastBAT method is slightly more powerful than the original fastBAT method in two different simulation scenarios (causal variants were either randomly distributed or clustered in small regions) (Supplementary Table 1). We re-ran the fastBAT-pruning analysis with a range of threshold r2 values and found that the LD-pruned fastBAT achieved the largest power gain at r2 threshold from approximately 0.9 to 0.99 depending on the SNP panel (HapMap2, HapMap3 or whole genome sequencing) and genetic architecture of the trait (Fig. 2). In practice, we recommend a threshold r2 value of 0.9 regardless of SNP panel, and do not recommend a threshold r2 < 0.7. In addition, we did not observe any inflation in −log10(p-value) for the LD-pruned fastBAT method under the hypothesis that there was no genetic effect (Supplementary Fig. 4).

View Article: PubMed Central - PubMed

ABSTRACT

We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064&ndash;339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT&thinsp;&lt;&thinsp;5&thinsp;&times;&thinsp;10&minus;8. The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species.

No MeSH data available.