Limits...
SNP set association analysis for genome-wide association studies.

Cai M, Dai H, Qiu Y, Zhao Y, Zhang R, Chu M, Dai J, Hu Z, Shen H, Chen F - PLoS ONE (2013)

Bottom Line: Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs).Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model.We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.

ABSTRACT
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

Show MeSH

Related in: MedlinePlus

Powers of the causal SNP in Scenario B2 based on the CLPTM1L gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 29 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3643925&req=5

pone-0062495-g003: Powers of the causal SNP in Scenario B2 based on the CLPTM1L gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 29 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).

Mentions: Results of scenario B2 are presented by Figure 3. On the basis of Figure 3, we can examine how test powers of each method vary with MAF and LD of the causal SNP. In general, all methods have power when the causal SNP is in high LD with the other SNPs. In most occasions, SPCA still has the greatest power, which is followed by KPCA. When the MAF of the causal SNP is low, powers of four methods are all weak, which are only about 10%. It is worth noticing that though PCA does not have good performance in general, it has greater power than the other under this situation. For example, the causal SNP is at one of the 6th–7th and 13th loci.


SNP set association analysis for genome-wide association studies.

Cai M, Dai H, Qiu Y, Zhao Y, Zhang R, Chu M, Dai J, Hu Z, Shen H, Chen F - PLoS ONE (2013)

Powers of the causal SNP in Scenario B2 based on the CLPTM1L gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 29 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3643925&req=5

pone-0062495-g003: Powers of the causal SNP in Scenario B2 based on the CLPTM1L gene.The top plot shows the power (y-axis) of each method over the locations (x-axis) of the causal SNPs. The bar-plot shows the MAFs of all SNPs. The LD structure of the 29 SNPs is shown by the heat plot in the bottom of the figure, in which the red scale indicates the value of R2 (1 = red, 0 = white).
Mentions: Results of scenario B2 are presented by Figure 3. On the basis of Figure 3, we can examine how test powers of each method vary with MAF and LD of the causal SNP. In general, all methods have power when the causal SNP is in high LD with the other SNPs. In most occasions, SPCA still has the greatest power, which is followed by KPCA. When the MAF of the causal SNP is low, powers of four methods are all weak, which are only about 10%. It is worth noticing that though PCA does not have good performance in general, it has greater power than the other under this situation. For example, the causal SNP is at one of the 6th–7th and 13th loci.

Bottom Line: Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs).Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model.We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China.

ABSTRACT
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥ 2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population.

Show MeSH
Related in: MedlinePlus