Limits...
Impact of prior specifications in a shrinkage-inducing Bayesian model for quantitative trait mapping and genomic prediction.

Knürr T, Läärä E, Sillanpää MJ - Genet. Sel. Evol. (2013)

Bottom Line: The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications.Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably.Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics and Statistics, P.O. Box 68, University of Helsinki, Helsinki, FIN-00014, Finland.

ABSTRACT

Background: In quantitative trait mapping and genomic prediction, Bayesian variable selection methods have gained popularity in conjunction with the increase in marker data and computational resources. Whereas shrinkage-inducing methods are common tools in genomic prediction, rigorous decision making in mapping studies using such models is not well established and the robustness of posterior results is subject to misspecified assumptions because of weak biological prior evidence.

Methods: Here, we evaluate the impact of prior specifications in a shrinkage-based Bayesian variable selection method which is based on a mixture of uniform priors applied to genetic marker effects that we presented in a previous study. Unlike most other shrinkage approaches, the use of a mixture of uniform priors provides a coherent framework for inference based on Bayes factors. To evaluate the robustness of genetic association under varying prior specifications, Bayes factors are compared as signals of positive marker association, whereas genomic estimated breeding values are considered for genomic selection. The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications. A Gibbs sampler is used to perform Markov chain Monte Carlo estimation (MCMC) and a generalized expectation-maximization algorithm as a faster alternative for maximum a posteriori point estimation. The performance of the method is evaluated by using two publicly available data examples: the simulated QTLMAS XII data set and a real data set from a population of pigs.

Results: Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably. Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC.

Conclusions: Since no error-free method to specify priors is available for complex biological phenomena, exploring a wide variety of prior specifications and combining results provides some solution to this problem. For this purpose, the mixture of uniform priors approach is especially suitable, because it comprises a wide and flexible family of distributions and computationally intensive estimation can be carried out in a reasonable amount of time.

Show MeSH
Accuracy (panel I) and bias (panel II) estimates under varying specifications of the hyper-parametersp0 (subpanels a-d for both panel I and II) andb for the four SNP sets in the analysis of the real data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750442&req=5

Figure 1: Accuracy (panel I) and bias (panel II) estimates under varying specifications of the hyper-parametersp0 (subpanels a-d for both panel I and II) andb for the four SNP sets in the analysis of the real data set.

Mentions: As shown in Figure 1, the estimated accuracies were highly sensitive to the choice of b and, for SIS10K and RAND10K, deteriorated with b approaching 0 and b>0.015. In contrast, both RAND1K and SIS1K exhibited the best accuracies for b>0.015, showing horizontally asymptotic-like behaviour for increasing values of b. The accuracies were quite similar for RAND10K and SIS10K, except for b ranging between 0.015 and 0.03, where SIS10K yielded higher accuracies. In all cases, RAND1K had lower accuracies than SIS1K. As mentioned above, the higher accuracies for SIS may be, at least partially, due to the over-estimation induced during pre-selection by SIS.


Impact of prior specifications in a shrinkage-inducing Bayesian model for quantitative trait mapping and genomic prediction.

Knürr T, Läärä E, Sillanpää MJ - Genet. Sel. Evol. (2013)

Accuracy (panel I) and bias (panel II) estimates under varying specifications of the hyper-parametersp0 (subpanels a-d for both panel I and II) andb for the four SNP sets in the analysis of the real data set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750442&req=5

Figure 1: Accuracy (panel I) and bias (panel II) estimates under varying specifications of the hyper-parametersp0 (subpanels a-d for both panel I and II) andb for the four SNP sets in the analysis of the real data set.
Mentions: As shown in Figure 1, the estimated accuracies were highly sensitive to the choice of b and, for SIS10K and RAND10K, deteriorated with b approaching 0 and b>0.015. In contrast, both RAND1K and SIS1K exhibited the best accuracies for b>0.015, showing horizontally asymptotic-like behaviour for increasing values of b. The accuracies were quite similar for RAND10K and SIS10K, except for b ranging between 0.015 and 0.03, where SIS10K yielded higher accuracies. In all cases, RAND1K had lower accuracies than SIS1K. As mentioned above, the higher accuracies for SIS may be, at least partially, due to the over-estimation induced during pre-selection by SIS.

Bottom Line: The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications.Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably.Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics and Statistics, P.O. Box 68, University of Helsinki, Helsinki, FIN-00014, Finland.

ABSTRACT

Background: In quantitative trait mapping and genomic prediction, Bayesian variable selection methods have gained popularity in conjunction with the increase in marker data and computational resources. Whereas shrinkage-inducing methods are common tools in genomic prediction, rigorous decision making in mapping studies using such models is not well established and the robustness of posterior results is subject to misspecified assumptions because of weak biological prior evidence.

Methods: Here, we evaluate the impact of prior specifications in a shrinkage-based Bayesian variable selection method which is based on a mixture of uniform priors applied to genetic marker effects that we presented in a previous study. Unlike most other shrinkage approaches, the use of a mixture of uniform priors provides a coherent framework for inference based on Bayes factors. To evaluate the robustness of genetic association under varying prior specifications, Bayes factors are compared as signals of positive marker association, whereas genomic estimated breeding values are considered for genomic selection. The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications. A Gibbs sampler is used to perform Markov chain Monte Carlo estimation (MCMC) and a generalized expectation-maximization algorithm as a faster alternative for maximum a posteriori point estimation. The performance of the method is evaluated by using two publicly available data examples: the simulated QTLMAS XII data set and a real data set from a population of pigs.

Results: Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably. Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC.

Conclusions: Since no error-free method to specify priors is available for complex biological phenomena, exploring a wide variety of prior specifications and combining results provides some solution to this problem. For this purpose, the mixture of uniform priors approach is especially suitable, because it comprises a wide and flexible family of distributions and computationally intensive estimation can be carried out in a reasonable amount of time.

Show MeSH