Limits...
Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation

View Article: PubMed Central - PubMed

ABSTRACT

Background: Genotyping-by-sequencing (GBS) has emerged as a powerful and cost-effective approach for discovering and genotyping single-nucleotide polymorphisms. The GBS technique was largely used in crop species where its low sequence coverage is not a drawback for calling genotypes because inbred lines are almost homozygous. In contrast, only a few studies used the GBS technique in animal populations (with sizeable heterozygosity rates) and many of those that have been published did not consider the quality of the genotypes produced by the bioinformatic pipelines. To improve the sequence coverage of the fragments, an alternative GBS preparation protocol that includes selective primers during the PCR amplification step has been recently proposed. In this study, we compared this modified protocol with the conventional two-enzyme GBS protocol. We also described various procedures to maximize the selection of high quality genotypes and to increase the accuracy of imputation.

Results: The in silico digestions of the bovine genome showed that the combination of PstI and MspI is more suitable for sequencing bovine GBS libraries than the use of single digestions with PstI or ApeKI. The sequencing output of the GBS libraries generated a total of 123,666 variants with the selective-primer approach and 272,103 variants with the conventional approach. Validating our data with genotypes obtained from mass spectrometry and Illumina’s bovine SNP50 array, we found that the genotypes produced by the conventional GBS method were concordant with those produced by these alternative genotyping methods, whereas the selective-primer method failed to call heterozygotes with confidence. Our results indicate that high accuracy in genotype calling (>97%) can be obtained using low read-depth thresholds (3 to 5 reads) provided that markers are simultaneously filtered for genotype quality scores. We also show that factors such as the minimum call rate and the minor allele frequency positively influence the accuracy of imputation of missing GBS data. The highest accuracies (around 85%) of imputed GBS markers were obtained with the FIMPUTE program when GBS and SNP50 array genotypes were combined (80,190 to 100,297 markers) before imputation.

Conclusions: We discovered that the conventional two-enzyme GBS protocol could produce a large number of high-quality genotypes provided that appropriate filtration criteria were used. In contrast, the selective-primer approach resulted in a substantial proportion of miscalled genotypes and should be avoided for livestock genotyping studies. Overall, our study demonstrates that carefully adjusting the different filtering parameters applied to the GBS data is critical to maximize the selection of high quality genotypes and to increase the accuracy of imputation of missing data. The strategies and results presented here provide a framework to maximize the output of the GBS technique in animal populations and qualified the PstI/MspI GBS assay as a low-cost high-density genotyping platform. The conclusions reported here regarding read-depth and genotype quality filtering could benefit many GBS applications, notably genome-wide association studies, where there is a need to increase the density of markers genotyped across the target population while preserving the quality of genotypes.

No MeSH data available.


In silico analysis of restriction enzyme sites in the bovine genome. The percentage was calculated based on the number of fragments obtained with the respective digestion that fall within each range of fragment lengths over the total number of fragments obtained with the corresponding restriction enzyme digestion. The total number of fragments obtained with the corresponding restriction enzyme is indicated in the legend box. The number of fragments computed in the size range between 100 and 500 bp is indicated above the corresponding bar of the small histogram
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5382419&req=5

Fig1: In silico analysis of restriction enzyme sites in the bovine genome. The percentage was calculated based on the number of fragments obtained with the respective digestion that fall within each range of fragment lengths over the total number of fragments obtained with the corresponding restriction enzyme digestion. The total number of fragments obtained with the corresponding restriction enzyme is indicated in the legend box. The number of fragments computed in the size range between 100 and 500 bp is indicated above the corresponding bar of the small histogram

Mentions: Digestion of the 2.6-billion-bp bovine genome with a 6-bp restriction enzyme has the potential to generate more than 634,000 fragments, most of which are greater than 1 kbp, a size that is too large to be efficiently sequenced by Illumina’s HiSeq systems. To reduce the number of fragments produced and to maximise the proportion of fragments with optimal sizes for sequencing (100 to 500 bp), the two-enzyme version of the GBS protocol proposed by Poland et al. [4] uses a combination of a ‘medium frequency cutter’, PstI (CTGCAG), with a ‘frequent cutter’, MspI (CCGG). To make sure that these two enzymes were suitable for the bovine genome, we performed an in silico digestion of the bovine chromosome (Fig. 1) and compared the predicted fragment-size distribution with other enzymes used in recent GBS studies. As shown in Fig. 1, ApeKI and the combination of PstI and MspI produced the larger proportions of fragments in the size range between 100 and 500 bp relative to the other enzymes. Digestions of the bovine genome with PstI/MspI generated a total of 754,306 fragments, fewer than those produced by single-enzyme digestions with ApeKI (5.68 million), PstI (1.58 million), or MspI (1.97 million). We therefore concluded that the combination of PstI and MspI has the potential to significantly reduce the complexity of the bovine genome, because these enzymes target fewer sites than single digestions with PstI, MspI or ApeKI do, a situation that may increase the sequence coverage of the resulting fragments.Fig. 1


Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation
In silico analysis of restriction enzyme sites in the bovine genome. The percentage was calculated based on the number of fragments obtained with the respective digestion that fall within each range of fragment lengths over the total number of fragments obtained with the corresponding restriction enzyme digestion. The total number of fragments obtained with the corresponding restriction enzyme is indicated in the legend box. The number of fragments computed in the size range between 100 and 500 bp is indicated above the corresponding bar of the small histogram
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5382419&req=5

Fig1: In silico analysis of restriction enzyme sites in the bovine genome. The percentage was calculated based on the number of fragments obtained with the respective digestion that fall within each range of fragment lengths over the total number of fragments obtained with the corresponding restriction enzyme digestion. The total number of fragments obtained with the corresponding restriction enzyme is indicated in the legend box. The number of fragments computed in the size range between 100 and 500 bp is indicated above the corresponding bar of the small histogram
Mentions: Digestion of the 2.6-billion-bp bovine genome with a 6-bp restriction enzyme has the potential to generate more than 634,000 fragments, most of which are greater than 1 kbp, a size that is too large to be efficiently sequenced by Illumina’s HiSeq systems. To reduce the number of fragments produced and to maximise the proportion of fragments with optimal sizes for sequencing (100 to 500 bp), the two-enzyme version of the GBS protocol proposed by Poland et al. [4] uses a combination of a ‘medium frequency cutter’, PstI (CTGCAG), with a ‘frequent cutter’, MspI (CCGG). To make sure that these two enzymes were suitable for the bovine genome, we performed an in silico digestion of the bovine chromosome (Fig. 1) and compared the predicted fragment-size distribution with other enzymes used in recent GBS studies. As shown in Fig. 1, ApeKI and the combination of PstI and MspI produced the larger proportions of fragments in the size range between 100 and 500 bp relative to the other enzymes. Digestions of the bovine genome with PstI/MspI generated a total of 754,306 fragments, fewer than those produced by single-enzyme digestions with ApeKI (5.68 million), PstI (1.58 million), or MspI (1.97 million). We therefore concluded that the combination of PstI and MspI has the potential to significantly reduce the complexity of the bovine genome, because these enzymes target fewer sites than single digestions with PstI, MspI or ApeKI do, a situation that may increase the sequence coverage of the resulting fragments.Fig. 1

View Article: PubMed Central - PubMed

ABSTRACT

Background: Genotyping-by-sequencing (GBS) has emerged as a powerful and cost-effective approach for discovering and genotyping single-nucleotide polymorphisms. The GBS technique was largely used in crop species where its low sequence coverage is not a drawback for calling genotypes because inbred lines are almost homozygous. In contrast, only a few studies used the GBS technique in animal populations (with sizeable heterozygosity rates) and many of those that have been published did not consider the quality of the genotypes produced by the bioinformatic pipelines. To improve the sequence coverage of the fragments, an alternative GBS preparation protocol that includes selective primers during the PCR amplification step has been recently proposed. In this study, we compared this modified protocol with the conventional two-enzyme GBS protocol. We also described various procedures to maximize the selection of high quality genotypes and to increase the accuracy of imputation.

Results: The in silico digestions of the bovine genome showed that the combination of PstI and MspI is more suitable for sequencing bovine GBS libraries than the use of single digestions with PstI or ApeKI. The sequencing output of the GBS libraries generated a total of 123,666 variants with the selective-primer approach and 272,103 variants with the conventional approach. Validating our data with genotypes obtained from mass spectrometry and Illumina’s bovine SNP50 array, we found that the genotypes produced by the conventional GBS method were concordant with those produced by these alternative genotyping methods, whereas the selective-primer method failed to call heterozygotes with confidence. Our results indicate that high accuracy in genotype calling (>97%) can be obtained using low read-depth thresholds (3 to 5 reads) provided that markers are simultaneously filtered for genotype quality scores. We also show that factors such as the minimum call rate and the minor allele frequency positively influence the accuracy of imputation of missing GBS data. The highest accuracies (around 85%) of imputed GBS markers were obtained with the FIMPUTE program when GBS and SNP50 array genotypes were combined (80,190 to 100,297 markers) before imputation.

Conclusions: We discovered that the conventional two-enzyme GBS protocol could produce a large number of high-quality genotypes provided that appropriate filtration criteria were used. In contrast, the selective-primer approach resulted in a substantial proportion of miscalled genotypes and should be avoided for livestock genotyping studies. Overall, our study demonstrates that carefully adjusting the different filtering parameters applied to the GBS data is critical to maximize the selection of high quality genotypes and to increase the accuracy of imputation of missing data. The strategies and results presented here provide a framework to maximize the output of the GBS technique in animal populations and qualified the PstI/MspI GBS assay as a low-cost high-density genotyping platform. The conclusions reported here regarding read-depth and genotype quality filtering could benefit many GBS applications, notably genome-wide association studies, where there is a need to increase the density of markers genotyped across the target population while preserving the quality of genotypes.

No MeSH data available.