Genomic prediction based on data from three layer lines using non-linear regression models.
Bottom Line:
These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1).In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy.Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.
View Article:
PubMed Central - PubMed
Affiliation: Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, 6700 AH, The Netherlands. mario.calus@wur.nl.
ABSTRACT
Show MeSH
Background: Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods: In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. Results: When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Conclusions: Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous. Related in: MedlinePlus |
Related In:
Results -
Collection
License 1 - License 2 getmorefigures.php?uid=PMC4221696&req=5
Mentions: Because linear and non-linear models focus on different aspects of the genomic data, in this subsection, we analysed the complementarity between models. One way to measure the complementarity between two approaches is based on the correlation between their predictions. Correlations of genomic predictions were computed between models for the training dataset that included all three lines (Table 5). In general, predictions from the Poly models had the lowest correlations with those of other models, which is in line with the observation that, in most cases, the Poly models had the poorest performance in terms of predictive correlation. Ignoring the Poly models, the correlations between predictions from the different models were generally high (>0.9) for line W1. For lines B1 and B2, the predictions from the RBF models had correlations lower than 0.9 with those of GBLUP and RRPCA and higher than 0.9 with those of MTGBLUP. The prediction from the MTGBLUP model deviated substantially from those of GBLUP, with correlations of 0.91 to 0.98. The level of the correlations showed that combining predictions of different models could lead to more accurate predictions. The potential of such an approach was investigated by evaluating combined predictions of two models. A weighted combination of two predictions (â1, â2), can be easily obtained using the following equation:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{a}=\beta {\widehat{a}}_1+\left(1-\beta \right){\widehat{a}}_2,0\le \beta \le 1, $$\end{document}where parameter β defines the weight given to the two approaches. When β is equal to 0 or 1, the combination is reduced to either of the two predictions. Figure 1 shows the predictive correlations of this combined prediction for the linear models GBLUP and RRPCA and the non-linear model RBF. In Figure 1, each row represents the results for one combination of models and each column represents the results for one of the lines. For line B1, combining predictions from a linear and a non-linear model improved the predictive correlation, especially for the combination of GBLUP and RBF. For line B2, there was little gain by combining models, which is probably due to the superior performance of the RRPCA model. For line W1, the combined prediction was in all cases slightly more accurate. Interestingly, across all situations, the benefit of combining predictions of two models was largest when the two models had a similar predictive correlation.Table 5 |
View Article: PubMed Central - PubMed
Affiliation: Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, 6700 AH, The Netherlands. mario.calus@wur.nl.
Background: Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods.
Methods: In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values.
Results: When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.
Conclusions: Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.