Limits...
Assessing statistical significance in multivariable genome wide association analysis.

Buzdugan L, Kalisch M, Navarro A, Schunk D, Fehr E, Bühlmann P - Bioinformatics (2016)

Bottom Line: The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS.Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs.Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. peter.buehlmann@stat.math.ethz.ch Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: Seminar for Statistics, Department of Mathematics, ETH Zürich, Zürich 8092, Switzerland Department of Economics, University of Zürich, Zürich 8006, Switzerland.

No MeSH data available.


Related in: MedlinePlus

Schematic overview of the method. ‘Clustering’ refers to the step of hierarchically clustering the SNPs. SNPs on different chromosomes are clustered separately, after which the 22 clusters are joined into one final cluster containing all SNPs. ‘Multi-Sample Splitting and SNP Screening’ stands for the SNP selection in steps 1 and 2 of the method described in Section 2.4.2. These selected SNPs are used to compute the P-values. Finally, the last step of the method—‘Hierarchical Testing’—uses the selected SNPs to test groups of SNPs and eventually single SNPs. This testing is done hierarchically, on the cluster previously constructed. The output of the method consists of significant groups, or single SNPs, along with their P-values, that are adjusted for multiple testing
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920127&req=5

btw128-F1: Schematic overview of the method. ‘Clustering’ refers to the step of hierarchically clustering the SNPs. SNPs on different chromosomes are clustered separately, after which the 22 clusters are joined into one final cluster containing all SNPs. ‘Multi-Sample Splitting and SNP Screening’ stands for the SNP selection in steps 1 and 2 of the method described in Section 2.4.2. These selected SNPs are used to compute the P-values. Finally, the last step of the method—‘Hierarchical Testing’—uses the selected SNPs to test groups of SNPs and eventually single SNPs. This testing is done hierarchically, on the cluster previously constructed. The output of the method consists of significant groups, or single SNPs, along with their P-values, that are adjusted for multiple testing

Mentions: The difficulty with a regression type analysis is the sheer high-dimensionality of the problem. The number of SNPs is massively larger than sample size n, which is at least one order of magnitude smaller. In such scenarios, standard statistical inference methods fail. Recent progress based on new methods such as multiple sample splitting, has allowed us to obtain statistical significance measures for regression parameters βj (Bühlmann, 2013; Meinshausen et al., 2009; Zhang and Zhang, 2014, cf.) or groups thereof (Mandozzi and Bühlmann, 2015). We rely here on this method (Mandozzi and Bühlmann, 2015), which shows reliable performance over a wide range of simulation settings (Dezeure et al., 2015), and enjoys the property of being computationally vastly more efficient than procedures which operate on the entire dataset. We extend the procedure from Mandozzi and Bühlmann (2015) from linear to logistic regression, and we show here for the first time how it performs for extremely high-dimensional GWAS data. The entire statistical procedure is schematically summarized in Figure 1.Fig. 1.


Assessing statistical significance in multivariable genome wide association analysis.

Buzdugan L, Kalisch M, Navarro A, Schunk D, Fehr E, Bühlmann P - Bioinformatics (2016)

Schematic overview of the method. ‘Clustering’ refers to the step of hierarchically clustering the SNPs. SNPs on different chromosomes are clustered separately, after which the 22 clusters are joined into one final cluster containing all SNPs. ‘Multi-Sample Splitting and SNP Screening’ stands for the SNP selection in steps 1 and 2 of the method described in Section 2.4.2. These selected SNPs are used to compute the P-values. Finally, the last step of the method—‘Hierarchical Testing’—uses the selected SNPs to test groups of SNPs and eventually single SNPs. This testing is done hierarchically, on the cluster previously constructed. The output of the method consists of significant groups, or single SNPs, along with their P-values, that are adjusted for multiple testing
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920127&req=5

btw128-F1: Schematic overview of the method. ‘Clustering’ refers to the step of hierarchically clustering the SNPs. SNPs on different chromosomes are clustered separately, after which the 22 clusters are joined into one final cluster containing all SNPs. ‘Multi-Sample Splitting and SNP Screening’ stands for the SNP selection in steps 1 and 2 of the method described in Section 2.4.2. These selected SNPs are used to compute the P-values. Finally, the last step of the method—‘Hierarchical Testing’—uses the selected SNPs to test groups of SNPs and eventually single SNPs. This testing is done hierarchically, on the cluster previously constructed. The output of the method consists of significant groups, or single SNPs, along with their P-values, that are adjusted for multiple testing
Mentions: The difficulty with a regression type analysis is the sheer high-dimensionality of the problem. The number of SNPs is massively larger than sample size n, which is at least one order of magnitude smaller. In such scenarios, standard statistical inference methods fail. Recent progress based on new methods such as multiple sample splitting, has allowed us to obtain statistical significance measures for regression parameters βj (Bühlmann, 2013; Meinshausen et al., 2009; Zhang and Zhang, 2014, cf.) or groups thereof (Mandozzi and Bühlmann, 2015). We rely here on this method (Mandozzi and Bühlmann, 2015), which shows reliable performance over a wide range of simulation settings (Dezeure et al., 2015), and enjoys the property of being computationally vastly more efficient than procedures which operate on the entire dataset. We extend the procedure from Mandozzi and Bühlmann (2015) from linear to logistic regression, and we show here for the first time how it performs for extremely high-dimensional GWAS data. The entire statistical procedure is schematically summarized in Figure 1.Fig. 1.

Bottom Line: The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS.Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs.Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. peter.buehlmann@stat.math.ethz.ch Supplementary data are available at Bioinformatics online.

View Article: PubMed Central - PubMed

Affiliation: Seminar for Statistics, Department of Mathematics, ETH Zürich, Zürich 8092, Switzerland Department of Economics, University of Zürich, Zürich 8006, Switzerland.

No MeSH data available.


Related in: MedlinePlus