A Bayesian outlier criterion to detect SNPs under selection in large data sets.
Bottom Line:
Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations.An application to a cattle data set is also provided.The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection.
View Article:
PubMed Central - PubMed
Affiliation: INRA, UMR1313 GABI, Jouy-en-Josas, France. mathieu.gautier@jouy.inra.fr
ABSTRACT
Show MeSH
Background: The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. Methodology/principal findings: The purpose of this study is to develop an efficient model-based approach to perform bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values). Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided. Conclusions/significance: The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection. |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC2914027&req=5
Mentions: Although we expected lower (resp. upper) tails to be enriched in SNPs under positive (resp. balancing) selection, identifying outliers on the observed PPP-value distribution makes it impossible, in practice, to control for False Discovery Rate (FDR) or False Negative Rate. We thus further investigated the power and robustness of each model, based on the simulated data sets, by computing FDR and recording the number of SNPs properly identified as subjected to selection for different PPP-value threshold (Table 2). For a given threshold the FDR but also the power decreased as the number of simulated generations increased, which was expected since this also resulted in sharpening the overall PPP-value distribution due in particular to an increase of the cj and thus the allele frequency variance. Similarly, the power was always higher when considering SNPs subjected to stronger selection (see above). Model 2 appeared far more efficient than model 1 mainly because PPP-value estimates were more extreme for SNPs under selection (e.g. Figures 2C and 2F). For instance, when T = 75 a threshold of 0.2 to detect SNPs under positive selection resulted in a FDR equal to 0 while the power was equal to respectively 13.6% when using model 1 and 68.4% when using model 2. The associated FDR for such a threshold when T = 10 was close to 10% for both models (Table 1). Finally, performing similar analyses on simulated data sets with a lower number of populations (Tables 1 and 2) lead to only a slight decrease in overall power. |
View Article: PubMed Central - PubMed
Affiliation: INRA, UMR1313 GABI, Jouy-en-Josas, France. mathieu.gautier@jouy.inra.fr
Background: The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged.
Methodology/principal findings: The purpose of this study is to develop an efficient model-based approach to perform bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values). Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided.
Conclusions/significance: The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection.