Limits...
SNP set association analysis for genome-wide association studies.

Cai M, Dai H, Qiu Y, Zhao Y, Zhang R, Chu M, Dai J, Hu Z, Shen H, Chen F - PLoS ONE (2013)

Bottom Line: Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs).Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model.We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.

ABSTRACT
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

Show MeSH

Related in: MedlinePlus

Powers of the causal SNP in Scenario C2 based on the XRCC1 gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 24 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3643925&req=5

pone-0062495-g004: Powers of the causal SNP in Scenario C2 based on the XRCC1 gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 24 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).

Mentions: Results of scenario C2 are shown by Figure 4, by which we also can evaluate how test powers of each method vary with MAF and LD of the causal SNPs. The change trends of test power are similar with those of the CLPTM1L gene. All of the four methods have statistical power when the causal SNP is in strong LD with the other SNPs. Some SNPs do not have high MAF and LD structures, so four methods have low powers, such as the 2th, 6th, and 7th loci. Again, SPCA has the best power in most situations. We also find that SPCA has much better performances than the other methods especially when the causal SNPs have high MAF. Simulations from scenarios C3–C4 generate similar results (Table S6 in File S1) and they also show that tests combining multiple SNPs tend to have higher power.


SNP set association analysis for genome-wide association studies.

Cai M, Dai H, Qiu Y, Zhao Y, Zhang R, Chu M, Dai J, Hu Z, Shen H, Chen F - PLoS ONE (2013)

Powers of the causal SNP in Scenario C2 based on the XRCC1 gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 24 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3643925&req=5

pone-0062495-g004: Powers of the causal SNP in Scenario C2 based on the XRCC1 gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 24 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).
Mentions: Results of scenario C2 are shown by Figure 4, by which we also can evaluate how test powers of each method vary with MAF and LD of the causal SNPs. The change trends of test power are similar with those of the CLPTM1L gene. All of the four methods have statistical power when the causal SNP is in strong LD with the other SNPs. Some SNPs do not have high MAF and LD structures, so four methods have low powers, such as the 2th, 6th, and 7th loci. Again, SPCA has the best power in most situations. We also find that SPCA has much better performances than the other methods especially when the causal SNPs have high MAF. Simulations from scenarios C3–C4 generate similar results (Table S6 in File S1) and they also show that tests combining multiple SNPs tend to have higher power.

Bottom Line: Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs).Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model.We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.

ABSTRACT
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

Show MeSH
Related in: MedlinePlus