Limits...
A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value.

Meuwissen TH, Solberg TR, Shepherd R, Woolliams JA - Genet. Sel. Evol. (2009)

Bottom Line: For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution.The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values.The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute Animal and Aquacultural Sciences, Norwegian University of Life Sciences, As, Norway. theo.meuwissen@umb.no

ABSTRACT
Genomic selection uses genome-wide dense SNP marker genotyping for the prediction of genetic values, and consists of two steps: (1) estimation of SNP effects, and (2) prediction of genetic value based on SNP genotypes and estimates of their effects. For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution. Whilst such estimators have been developed using Monte Carlo Markov chain (MCMC), here we derive a much faster non-MCMC based estimator by analytically performing the required integrations. The accuracy of the genome-wide breeding value estimates was 0.011 (s.e. 0.005) lower than that of the MCMC based BayesB predictor, which may be because the integrations were performed one-by-one instead of for all SNPs simultaneously. The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values. The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.

Show MeSH
The expectation of the genetic value given the summary statistic of the data Y, E(g/Y), as a function of Y. The parameter of the exponential distribution is λ = 1, σ2 = 1, and the probability of a marker having a true effect is γ = 0.05; E(g/Y) calculated by numerical integration is represented by black dots and the analytical solution is shown as white dots.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2637029&req=5

Figure 1: The expectation of the genetic value given the summary statistic of the data Y, E(g/Y), as a function of Y. The parameter of the exponential distribution is λ = 1, σ2 = 1, and the probability of a marker having a true effect is γ = 0.05; E(g/Y) calculated by numerical integration is represented by black dots and the analytical solution is shown as white dots.

Mentions: In Figure 1, E [g/Y] is plotted against the value of Y with σ2 = 1; since Y is the sufficient statistic for g given the data y, E [g/y] = E [g/Y]. Figure 1 shows also the regression curve when the integrals in Equation (4) were numerically evaluated, as was done by Goddard [2]. The empirical curve of Goddard has similar characteristics, which is relatively flat at Y = 0, but approaches a derivative of 1 for extreme values of Y. However as a result of the closed expression in Appendix 1 (B3) it is possible to explore the full solution space and Figure 2 shows some examples from this space. The examples demonstrate several features. Firstly E [g/Y] is an odd function (in a mathematical sense) satisfying E [g/ -Y] = -E [g/Y]. Secondly, d E [g/Y]/dY is non-zero at Y = 0 but decreases towards 0 as γ tends to 0. Furthermore d E [g/Y]/dY is not necessarily monotonic, for example see γ = 0.05 in Figure 2. In the example with γ = 0.05 it is clear that d E [g/Y]/dY exceeds 1 for Y ≈ 3.5 i.e. an increment in Y results in a greater increment in E [g/Y]. Heuristically this occurs because for small γ there are only few non zero marker effects, but those present are large; therefore E [g/Y] is close to 0, since Y is expected to have occurred by chance, until Y becomes large and statistically unusual in magnitude, but once considered unusual, E [g/Y] is large. Asymptotically, for Y of large magnitude d E [g/Y]/dY tends to 1. The asymptotic behaviour of E(g/Y) is:


A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value.

Meuwissen TH, Solberg TR, Shepherd R, Woolliams JA - Genet. Sel. Evol. (2009)

The expectation of the genetic value given the summary statistic of the data Y, E(g/Y), as a function of Y. The parameter of the exponential distribution is λ = 1, σ2 = 1, and the probability of a marker having a true effect is γ = 0.05; E(g/Y) calculated by numerical integration is represented by black dots and the analytical solution is shown as white dots.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2637029&req=5

Figure 1: The expectation of the genetic value given the summary statistic of the data Y, E(g/Y), as a function of Y. The parameter of the exponential distribution is λ = 1, σ2 = 1, and the probability of a marker having a true effect is γ = 0.05; E(g/Y) calculated by numerical integration is represented by black dots and the analytical solution is shown as white dots.
Mentions: In Figure 1, E [g/Y] is plotted against the value of Y with σ2 = 1; since Y is the sufficient statistic for g given the data y, E [g/y] = E [g/Y]. Figure 1 shows also the regression curve when the integrals in Equation (4) were numerically evaluated, as was done by Goddard [2]. The empirical curve of Goddard has similar characteristics, which is relatively flat at Y = 0, but approaches a derivative of 1 for extreme values of Y. However as a result of the closed expression in Appendix 1 (B3) it is possible to explore the full solution space and Figure 2 shows some examples from this space. The examples demonstrate several features. Firstly E [g/Y] is an odd function (in a mathematical sense) satisfying E [g/ -Y] = -E [g/Y]. Secondly, d E [g/Y]/dY is non-zero at Y = 0 but decreases towards 0 as γ tends to 0. Furthermore d E [g/Y]/dY is not necessarily monotonic, for example see γ = 0.05 in Figure 2. In the example with γ = 0.05 it is clear that d E [g/Y]/dY exceeds 1 for Y ≈ 3.5 i.e. an increment in Y results in a greater increment in E [g/Y]. Heuristically this occurs because for small γ there are only few non zero marker effects, but those present are large; therefore E [g/Y] is close to 0, since Y is expected to have occurred by chance, until Y becomes large and statistically unusual in magnitude, but once considered unusual, E [g/Y] is large. Asymptotically, for Y of large magnitude d E [g/Y]/dY tends to 1. The asymptotic behaviour of E(g/Y) is:

Bottom Line: For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution.The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values.The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute Animal and Aquacultural Sciences, Norwegian University of Life Sciences, As, Norway. theo.meuwissen@umb.no

ABSTRACT
Genomic selection uses genome-wide dense SNP marker genotyping for the prediction of genetic values, and consists of two steps: (1) estimation of SNP effects, and (2) prediction of genetic value based on SNP genotypes and estimates of their effects. For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution. Whilst such estimators have been developed using Monte Carlo Markov chain (MCMC), here we derive a much faster non-MCMC based estimator by analytically performing the required integrations. The accuracy of the genome-wide breeding value estimates was 0.011 (s.e. 0.005) lower than that of the MCMC based BayesB predictor, which may be because the integrations were performed one-by-one instead of for all SNPs simultaneously. The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values. The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.

Show MeSH