Limits...
Assessment of the genomic variation in a cattle population by re-sequencing of key animals at low to medium coverage.

Jansen S, Aigner B, Pausch H, Wysocki M, Eck S, Benet-Pagès A, Graf E, Wieland T, Strom TM, Meitinger T, Fries R - BMC Genomics (2013)

Bottom Line: However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable.Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing.The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Technische Universität München, Liesel-Beckmann-Strasse 1, Freising, 85354, Germany.

ABSTRACT

Background: Genome- and population-wide re-sequencing would allow for most efficient detection of causal trait variants. However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable. We therefore resorted to re-sequencing of a limited number of bovine animals selected to explain a major proportion of the population's genomic variation, so called key animals, in order to provide a catalogue of functional variants and a substrate for population- and genome-wide imputation of variable sites.

Results: Forty-three animals accounting for about 69 percent of the genetic diversity of the Fleckvieh population, a cattle breed of Southern Germany and Austria, were sequenced with coverages ranging from 4.17 to 24.98 and averaging 7.46. After alignment to the reference genome (UMD3.1) and multi-sample variant calling, more than 17 million variant positions were identified, about 90 percent biallelic single nucleotide variants (SNVs) and 10 percent short insertions and deletions (InDels). The comparison with high-density chip data revealed a sensitivity of at least 92 percent and a specificity of 81 percent for sequencing based genotyping, and 97 percent and 93 percent when a imputation step was included. There are 91,733 variants in coding regions of 18,444 genes, 46 percent being non-synonymous exchanges, of which 575 variants are predicted to cause premature stop codons. Three variants are listed in the OMIA database as causal for specific phenotypes.

Conclusions: Low- to medium-coverage re-sequencing of individuals explaining a major fraction of a population's genomic variation allows for the efficient and reliable detection of most variants. Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing. The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies.

Show MeSH
Comparison of re-sequencing-derived with genotypes resulting from array-based genotyping. Non-reference sensitivity (NRS), considering all genotypes (phred-scaled quality score ≥ 0) (A). NRS, phred-scaled quality score ≥ 10 (B). Non-reference discrepancy (NRD), phred-scaled quality score ≥ 0 (C). NRD, phred-scaled quality score ≥ 10 (D). The grey and the blue symbols represent the NRS and NRD before and after Beagle imputation, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3716689&req=5

Figure 3: Comparison of re-sequencing-derived with genotypes resulting from array-based genotyping. Non-reference sensitivity (NRS), considering all genotypes (phred-scaled quality score ≥ 0) (A). NRS, phred-scaled quality score ≥ 10 (B). Non-reference discrepancy (NRD), phred-scaled quality score ≥ 0 (C). NRD, phred-scaled quality score ≥ 10 (D). The grey and the blue symbols represent the NRS and NRD before and after Beagle imputation, respectively.

Mentions: We compared array-called genotypes with sequence-derived genotypes at corresponding positions and calculated the non-reference sensitivity (NRS) and the non-reference discrepancy (NRD) rates [8] to assess the sensitivity and specificity, respectively, of the variant calling. Both measures depend on the coverage, on whether or not Beagle imputation is applied and on the minimal phred-scaled likelihood requirement (Figure 3). The effects of Beagle imputation and minimal phred score requirement are most prominent if the coverage is below 7x. The NRD rate for the animal with the lowest coverage (4.17x) was 9.78 percent before and 5.17 percent after Beagle imputation. Highest NRS values at comparably low NRD values are achieved after Beagle imputation and after removal of genotypes with phred scores below 10.


Assessment of the genomic variation in a cattle population by re-sequencing of key animals at low to medium coverage.

Jansen S, Aigner B, Pausch H, Wysocki M, Eck S, Benet-Pagès A, Graf E, Wieland T, Strom TM, Meitinger T, Fries R - BMC Genomics (2013)

Comparison of re-sequencing-derived with genotypes resulting from array-based genotyping. Non-reference sensitivity (NRS), considering all genotypes (phred-scaled quality score ≥ 0) (A). NRS, phred-scaled quality score ≥ 10 (B). Non-reference discrepancy (NRD), phred-scaled quality score ≥ 0 (C). NRD, phred-scaled quality score ≥ 10 (D). The grey and the blue symbols represent the NRS and NRD before and after Beagle imputation, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3716689&req=5

Figure 3: Comparison of re-sequencing-derived with genotypes resulting from array-based genotyping. Non-reference sensitivity (NRS), considering all genotypes (phred-scaled quality score ≥ 0) (A). NRS, phred-scaled quality score ≥ 10 (B). Non-reference discrepancy (NRD), phred-scaled quality score ≥ 0 (C). NRD, phred-scaled quality score ≥ 10 (D). The grey and the blue symbols represent the NRS and NRD before and after Beagle imputation, respectively.
Mentions: We compared array-called genotypes with sequence-derived genotypes at corresponding positions and calculated the non-reference sensitivity (NRS) and the non-reference discrepancy (NRD) rates [8] to assess the sensitivity and specificity, respectively, of the variant calling. Both measures depend on the coverage, on whether or not Beagle imputation is applied and on the minimal phred-scaled likelihood requirement (Figure 3). The effects of Beagle imputation and minimal phred score requirement are most prominent if the coverage is below 7x. The NRD rate for the animal with the lowest coverage (4.17x) was 9.78 percent before and 5.17 percent after Beagle imputation. Highest NRS values at comparably low NRD values are achieved after Beagle imputation and after removal of genotypes with phred scores below 10.

Bottom Line: However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable.Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing.The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Technische Universität München, Liesel-Beckmann-Strasse 1, Freising, 85354, Germany.

ABSTRACT

Background: Genome- and population-wide re-sequencing would allow for most efficient detection of causal trait variants. However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable. We therefore resorted to re-sequencing of a limited number of bovine animals selected to explain a major proportion of the population's genomic variation, so called key animals, in order to provide a catalogue of functional variants and a substrate for population- and genome-wide imputation of variable sites.

Results: Forty-three animals accounting for about 69 percent of the genetic diversity of the Fleckvieh population, a cattle breed of Southern Germany and Austria, were sequenced with coverages ranging from 4.17 to 24.98 and averaging 7.46. After alignment to the reference genome (UMD3.1) and multi-sample variant calling, more than 17 million variant positions were identified, about 90 percent biallelic single nucleotide variants (SNVs) and 10 percent short insertions and deletions (InDels). The comparison with high-density chip data revealed a sensitivity of at least 92 percent and a specificity of 81 percent for sequencing based genotyping, and 97 percent and 93 percent when a imputation step was included. There are 91,733 variants in coding regions of 18,444 genes, 46 percent being non-synonymous exchanges, of which 575 variants are predicted to cause premature stop codons. Three variants are listed in the OMIA database as causal for specific phenotypes.

Conclusions: Low- to medium-coverage re-sequencing of individuals explaining a major fraction of a population's genomic variation allows for the efficient and reliable detection of most variants. Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing. The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies.

Show MeSH