Limits...
Methods of tagSNP selection and other variables affecting imputation accuracy in swine.

Badke YM, Bates RO, Ernst CW, Schwab C, Fix J, Van Tassell CP, Steibel JP - BMC Genet. (2013)

Bottom Line: We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods.In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation.Average imputation accuracy can be further increased by adding haplotypes to the reference panel.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Animal Science, Michigan State University, East Lansing, MI, USA.

ABSTRACT

Background: Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation.

Results: TagSNP density of at least 1 tagSNP per 340kb (~7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (IA = 0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end).

Conclusion: The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype panel is more cost efficient than specifically sampling older animals or trios with no observed loss in imputation accuracy. We expect that the use of imputed genotypes in swine breeding will yield highly accurate predictions of GEBV, based on the observed accuracy and reported results in dairy cattle, where genomic evaluation of some individuals is based on genotypes imputed with the same accuracy as our Yorkshire population.

Show MeSH
Effect of number of reference haplotypes on imputation accuracy. Average imputation accuracy (IA) as a function of the number of haplotypes in the reference panel used for imputation. Imputation accuracy was estimated for reference panels composed of haplotypes from a trio design (blue triangle), reference panels composed of haplotypes from randomly sampled sires (red circle), and reference panels composed of both haplotypes from a trio design and haplotypes from randomly sampled sires (black circle).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3734000&req=5

Figure 5: Effect of number of reference haplotypes on imputation accuracy. Average imputation accuracy (IA) as a function of the number of haplotypes in the reference panel used for imputation. Imputation accuracy was estimated for reference panels composed of haplotypes from a trio design (blue triangle), reference panels composed of haplotypes from randomly sampled sires (red circle), and reference panels composed of both haplotypes from a trio design and haplotypes from randomly sampled sires (black circle).

Mentions: For all previous analyses in this paper we imputed genotypes of 889 individuals across all autosomes using a reference panel of 128 Yorkshire haplotypes obtained from a sire/dam/offspring genotyping design [23], phased with higher accuracy [22]. Reducing the number of imputation animals from 889 to 200 had no impact on the observed imputation accuracy. Imputation accuracy using all 128 haplotypes from the original reference panel was 0.959 on SSC14, which reduced to 0.939 when 64 haplotypes were used, and further to 0.904 when imputation was based on 32 haplotypes (Figure 5). Therefore, imputation accuracy larger than 0.90 can be obtained using the commercial 9K tagSNP set with a reference panel of only 32 haplotypes, given that these haplotypes were phased at high accuracy.


Methods of tagSNP selection and other variables affecting imputation accuracy in swine.

Badke YM, Bates RO, Ernst CW, Schwab C, Fix J, Van Tassell CP, Steibel JP - BMC Genet. (2013)

Effect of number of reference haplotypes on imputation accuracy. Average imputation accuracy (IA) as a function of the number of haplotypes in the reference panel used for imputation. Imputation accuracy was estimated for reference panels composed of haplotypes from a trio design (blue triangle), reference panels composed of haplotypes from randomly sampled sires (red circle), and reference panels composed of both haplotypes from a trio design and haplotypes from randomly sampled sires (black circle).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3734000&req=5

Figure 5: Effect of number of reference haplotypes on imputation accuracy. Average imputation accuracy (IA) as a function of the number of haplotypes in the reference panel used for imputation. Imputation accuracy was estimated for reference panels composed of haplotypes from a trio design (blue triangle), reference panels composed of haplotypes from randomly sampled sires (red circle), and reference panels composed of both haplotypes from a trio design and haplotypes from randomly sampled sires (black circle).
Mentions: For all previous analyses in this paper we imputed genotypes of 889 individuals across all autosomes using a reference panel of 128 Yorkshire haplotypes obtained from a sire/dam/offspring genotyping design [23], phased with higher accuracy [22]. Reducing the number of imputation animals from 889 to 200 had no impact on the observed imputation accuracy. Imputation accuracy using all 128 haplotypes from the original reference panel was 0.959 on SSC14, which reduced to 0.939 when 64 haplotypes were used, and further to 0.904 when imputation was based on 32 haplotypes (Figure 5). Therefore, imputation accuracy larger than 0.90 can be obtained using the commercial 9K tagSNP set with a reference panel of only 32 haplotypes, given that these haplotypes were phased at high accuracy.

Bottom Line: We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods.In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation.Average imputation accuracy can be further increased by adding haplotypes to the reference panel.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Animal Science, Michigan State University, East Lansing, MI, USA.

ABSTRACT

Background: Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation.

Results: TagSNP density of at least 1 tagSNP per 340kb (~7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (IA = 0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end).

Conclusion: The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype panel is more cost efficient than specifically sampling older animals or trios with no observed loss in imputation accuracy. We expect that the use of imputed genotypes in swine breeding will yield highly accurate predictions of GEBV, based on the observed accuracy and reported results in dairy cattle, where genomic evaluation of some individuals is based on genotypes imputed with the same accuracy as our Yorkshire population.

Show MeSH