Limits...
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.

van Binsbergen R, Calus MP, Bink MC, van Eeuwijk FA, Schrooten C, Veerkamp RF - Genet. Sel. Evol. (2015)

Bottom Line: In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait.BSSVS performed better than GBLUP in all cases.To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

View Article: PubMed Central - PubMed

Affiliation: Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands. rianne.vanbinsbergen@wur.nl.

ABSTRACT

Background: In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.

Methods: Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.

Results: Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.

Conclusions: Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

No MeSH data available.


Related in: MedlinePlus

Manhattan plot with estimated SNP effects (% of σg2) for somatic cell score (SCS) using the BSSVS model. Estimated SNP effects (% of σg2) based on the BSSVS model for somatic cell score using BovineHD data (a), ImputedHD data (b), and imputed sequence data (c)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4574568&req=5

Fig2: Manhattan plot with estimated SNP effects (% of σg2) for somatic cell score (SCS) using the BSSVS model. Estimated SNP effects (% of σg2) based on the BSSVS model for somatic cell score using BovineHD data (a), ImputedHD data (b), and imputed sequence data (c)

Mentions: For both genomic prediction methods, the (persistency in) reliabilities were highest when BovineHD genotype data were used compared to imputed sequence data. However, the additive genetic variances explained when imputed sequence data or BovineHD data was used were similar (Table 1). In Figs. 2, 3 and 4, the individual SNP effects are plotted (as % of σg2) for BSSVS using BovineHD data, ImputedHD data, and imputed sequence data. These Manhattan-plots do not show similar genome-wide association results as typically seen from single-SNP analyses. Instead, the Manhattan-plots represent the variances explained by a single SNP, conditional on fitting all other SNPs simultaneously. Therefore, SNP effects are much smaller than those obtained when only one SNP is fitted. Still, the figures show that when BovineHD data and ImputedHD data are used for SCS and PY, it is possible to detect some regions on the genome that explain greater levels of variance, e.g. on chromosomes 15 and 22 (SCS) and chromosome 14 (PY). For BovineHD data, 26 SNPs had a SNP variance greater than 0.003 %, with a maximum of 0.05 %, most of these SNPs were located in a 1.8 Mb region at the beginning of chromosome 14. With imputed sequence data, no clear region could be detected with large SNP effects on the traits, but it should be noted that with imputed sequence data, there are 20 times more SNPs. For a fair comparison with BovineHD data, SNPs in the imputed sequence data were grouped in windows of 20 neighboring SNPs and the sum of the variances of the neighboring SNPs per window was plotted. However, still we did not detect any clear regions with an increased level of explained variance (results not shown).Fig. 2


Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.

van Binsbergen R, Calus MP, Bink MC, van Eeuwijk FA, Schrooten C, Veerkamp RF - Genet. Sel. Evol. (2015)

Manhattan plot with estimated SNP effects (% of σg2) for somatic cell score (SCS) using the BSSVS model. Estimated SNP effects (% of σg2) based on the BSSVS model for somatic cell score using BovineHD data (a), ImputedHD data (b), and imputed sequence data (c)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4574568&req=5

Fig2: Manhattan plot with estimated SNP effects (% of σg2) for somatic cell score (SCS) using the BSSVS model. Estimated SNP effects (% of σg2) based on the BSSVS model for somatic cell score using BovineHD data (a), ImputedHD data (b), and imputed sequence data (c)
Mentions: For both genomic prediction methods, the (persistency in) reliabilities were highest when BovineHD genotype data were used compared to imputed sequence data. However, the additive genetic variances explained when imputed sequence data or BovineHD data was used were similar (Table 1). In Figs. 2, 3 and 4, the individual SNP effects are plotted (as % of σg2) for BSSVS using BovineHD data, ImputedHD data, and imputed sequence data. These Manhattan-plots do not show similar genome-wide association results as typically seen from single-SNP analyses. Instead, the Manhattan-plots represent the variances explained by a single SNP, conditional on fitting all other SNPs simultaneously. Therefore, SNP effects are much smaller than those obtained when only one SNP is fitted. Still, the figures show that when BovineHD data and ImputedHD data are used for SCS and PY, it is possible to detect some regions on the genome that explain greater levels of variance, e.g. on chromosomes 15 and 22 (SCS) and chromosome 14 (PY). For BovineHD data, 26 SNPs had a SNP variance greater than 0.003 %, with a maximum of 0.05 %, most of these SNPs were located in a 1.8 Mb region at the beginning of chromosome 14. With imputed sequence data, no clear region could be detected with large SNP effects on the traits, but it should be noted that with imputed sequence data, there are 20 times more SNPs. For a fair comparison with BovineHD data, SNPs in the imputed sequence data were grouped in windows of 20 neighboring SNPs and the sum of the variances of the neighboring SNPs per window was plotted. However, still we did not detect any clear regions with an increased level of explained variance (results not shown).Fig. 2

Bottom Line: In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait.BSSVS performed better than GBLUP in all cases.To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

View Article: PubMed Central - PubMed

Affiliation: Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands. rianne.vanbinsbergen@wur.nl.

ABSTRACT

Background: In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.

Methods: Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.

Results: Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.

Conclusions: Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

No MeSH data available.


Related in: MedlinePlus