Limits...
Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions.

de Los Campos G, Veturi Y, Vazquez AI, Lehermeier C, Pérez-Rodríguez P - J Agric Biol Environ Stat (2015)

Bottom Line: From this perspective, structure acts as an effect modifier rather than as a confounder.The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches.This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology & Biostatistics, Michigan State University, 909 Fee Road, Room B601, East Lansing, MI 48824 USA ; Department of Statistics & Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824 USA.

ABSTRACT

: Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

Electronic supplementary material: Supplementary materials for this article are available at 10.1007/s13253-015-0222-5.

No MeSH data available.


Related in: MedlinePlus

Scatter plot of estimated effects obtained with a stratified analysis (left) and estimated sampling distribution of the correlation between estimated effects obtained with 1000 permutations (right).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4666286&req=5

Fig2: Scatter plot of estimated effects obtained with a stratified analysis (left) and estimated sampling distribution of the correlation between estimated effects obtained with 1000 permutations (right).

Mentions: Given the important level of stratification present in this data set, it is reasonable to expect that QTL and marker effects may vary between sub-populations. To assess this hypothesis, we estimated marker effects using ridge regression within cluster; the results from this analysis are reported in the left panel of Fig. 2. The sample correlation of the estimated effects was only 0.1, suggesting that marker effects are markedly different between clusters. However, estimated effects are subject to sampling errors; consequently, the absolute value of the sample correlation between estimated effects under-estimates the absolute value of the correlation between the (unknown) true effects. Therefore, we cannot rule out the possibility that the low correlation observed between estimated effects only reflects the low precision of estimates. To assess whether or not this is the case, we conducted 1000 permutations by re-shuffling the cluster labels. This exercise yielded 1000 samples of the correlation between estimates obtained in a scenario where the group label has no relationship with the genetic architecture. A histogram of the 1000 correlations obtained is displayed in the right panel of Fig. 2. The average and median correlations from the permutation distribution were both 0.34, and the 2nd percentile was 0.28. We conclude that the correlation of estimates obtained when the groups were derived from the observed genotypes does not belong to the distribution obtained with the permutation. This suggests that indeed, as one may suspect from the inspection of principal components and the distribution of allele frequencies, marker effects vary between clusters.


Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions.

de Los Campos G, Veturi Y, Vazquez AI, Lehermeier C, Pérez-Rodríguez P - J Agric Biol Environ Stat (2015)

Scatter plot of estimated effects obtained with a stratified analysis (left) and estimated sampling distribution of the correlation between estimated effects obtained with 1000 permutations (right).
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4666286&req=5

Fig2: Scatter plot of estimated effects obtained with a stratified analysis (left) and estimated sampling distribution of the correlation between estimated effects obtained with 1000 permutations (right).
Mentions: Given the important level of stratification present in this data set, it is reasonable to expect that QTL and marker effects may vary between sub-populations. To assess this hypothesis, we estimated marker effects using ridge regression within cluster; the results from this analysis are reported in the left panel of Fig. 2. The sample correlation of the estimated effects was only 0.1, suggesting that marker effects are markedly different between clusters. However, estimated effects are subject to sampling errors; consequently, the absolute value of the sample correlation between estimated effects under-estimates the absolute value of the correlation between the (unknown) true effects. Therefore, we cannot rule out the possibility that the low correlation observed between estimated effects only reflects the low precision of estimates. To assess whether or not this is the case, we conducted 1000 permutations by re-shuffling the cluster labels. This exercise yielded 1000 samples of the correlation between estimates obtained in a scenario where the group label has no relationship with the genetic architecture. A histogram of the 1000 correlations obtained is displayed in the right panel of Fig. 2. The average and median correlations from the permutation distribution were both 0.34, and the 2nd percentile was 0.28. We conclude that the correlation of estimates obtained when the groups were derived from the observed genotypes does not belong to the distribution obtained with the permutation. This suggests that indeed, as one may suspect from the inspection of principal components and the distribution of allele frequencies, marker effects vary between clusters.

Bottom Line: From this perspective, structure acts as an effect modifier rather than as a confounder.The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches.This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology & Biostatistics, Michigan State University, 909 Fee Road, Room B601, East Lansing, MI 48824 USA ; Department of Statistics & Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824 USA.

ABSTRACT

: Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.

Electronic supplementary material: Supplementary materials for this article are available at 10.1007/s13253-015-0222-5.

No MeSH data available.


Related in: MedlinePlus