Limits...
Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions.

de Los Campos G, Veturi Y, Vazquez AI, Lehermeier C, Pérez-Rodríguez P - J Agric Biol Environ Stat (2015)

Bottom Line: From this perspective, structure acts as an effect modifier rather than as a confounder.The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches.This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology & Biostatistics, Michigan State University, 909 Fee Road, Room B601, East Lansing, MI 48824 USA ; Department of Statistics & Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824 USA.

ABSTRACT

: Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

Electronic supplementary material: Supplementary materials for this article are available at 10.1007/s13253-015-0222-5.

No MeSH data available.


Related in: MedlinePlus

Clustering in the pig data set. First two principal components (top-left panel) and allele frequency (top-right and lower panels) by group (1 in red, 2 in blue and 3 in black).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4666286&req=5

Fig3: Clustering in the pig data set. First two principal components (top-left panel) and allele frequency (top-right and lower panels) by group (1 in red, 2 in blue and 3 in black).

Mentions: The first two PCs and the estimated allele frequencies by cluster are presented in Fig. 1. In this data set, there were two clear clusters, one with 345 lines and the other one with 254 lines. The 1st two PCs clearly separate the two clusters inferred with PSMix, and many markers have markedly different allele frequencies in the two clusters. The results from the clustering in the pig data set are presented in Fig. 3. In this data set the 1st two PCs suggest the existence of three clusters; therefore, we implemented the clustering algorithm in PSMix with three clusters. Overall, the results from the clustering coincide with the groups that one can form by visual inspection of the first two PCs. However, for a few points, the clustering obtained with PSMix did not coincide with the classification that one could obtain based on the inspection of PCs. The degree of differentiation between clusters is clearly less marked in the pig data set than in the wheat data set: in the wheat data set, the 1st two PCs explained 16.3 % of the total variation (measured as the ratio of the sum of the two largest eigenvalues relative to the sum of all eigenvalues); this figure in the pig data set was 8.6 %, and unlike the wheat data set, allele frequencies were highly correlated between clusters in the pig data set.Fig. 3


Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions.

de Los Campos G, Veturi Y, Vazquez AI, Lehermeier C, Pérez-Rodríguez P - J Agric Biol Environ Stat (2015)

Clustering in the pig data set. First two principal components (top-left panel) and allele frequency (top-right and lower panels) by group (1 in red, 2 in blue and 3 in black).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4666286&req=5

Fig3: Clustering in the pig data set. First two principal components (top-left panel) and allele frequency (top-right and lower panels) by group (1 in red, 2 in blue and 3 in black).
Mentions: The first two PCs and the estimated allele frequencies by cluster are presented in Fig. 1. In this data set, there were two clear clusters, one with 345 lines and the other one with 254 lines. The 1st two PCs clearly separate the two clusters inferred with PSMix, and many markers have markedly different allele frequencies in the two clusters. The results from the clustering in the pig data set are presented in Fig. 3. In this data set the 1st two PCs suggest the existence of three clusters; therefore, we implemented the clustering algorithm in PSMix with three clusters. Overall, the results from the clustering coincide with the groups that one can form by visual inspection of the first two PCs. However, for a few points, the clustering obtained with PSMix did not coincide with the classification that one could obtain based on the inspection of PCs. The degree of differentiation between clusters is clearly less marked in the pig data set than in the wheat data set: in the wheat data set, the 1st two PCs explained 16.3 % of the total variation (measured as the ratio of the sum of the two largest eigenvalues relative to the sum of all eigenvalues); this figure in the pig data set was 8.6 %, and unlike the wheat data set, allele frequencies were highly correlated between clusters in the pig data set.Fig. 3

Bottom Line: From this perspective, structure acts as an effect modifier rather than as a confounder.The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches.This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology & Biostatistics, Michigan State University, 909 Fee Road, Room B601, East Lansing, MI 48824 USA ; Department of Statistics & Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824 USA.

ABSTRACT

: Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

Electronic supplementary material: Supplementary materials for this article are available at 10.1007/s13253-015-0222-5.

No MeSH data available.


Related in: MedlinePlus