Limits...
Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel.

Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, Schnabel RD, Taylor JF, Raadsma HW - BMC Genomics (2008)

Bottom Line: The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection.For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62).For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Advanced Technologies in Animal Genetics and Reproduction (ReproGen), University of Sydney, Camden, NSW 2570, Australia. M.Khatkar@usyd.edu.au

ABSTRACT

Background: The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection. Most studies on LD in cattle reported to date are based on microsatellite markers or small numbers of single nucleotide polymorphisms (SNPs) covering one or only a few chromosomes. This is the first comprehensive study on the extent of LD in cattle by analyzing data on 1,546 Holstein-Friesian bulls genotyped for 15,036 SNP markers covering all regions of all autosomes. Furthermore, most studies in cattle have used relatively small sample sizes and, consequently, may have had biased estimates of measures commonly used to describe LD. We examine minimum sample sizes required to estimate LD without bias and loss in accuracy. Finally, relatively little information is available on comparative LD structures including other mammalian species such as human and mouse, and we compare LD structure in cattle with public-domain data from both human and mouse.

Results: We computed three LD estimates, D', Dvol and r2, for 1,566,890 syntenic SNP pairs and a sample of 365,400 non-syntenic pairs. Mean D' is 0.189 among syntenic SNPs, and 0.105 among non-syntenic SNPs; mean r2 is 0.024 among syntenic SNPs and 0.0032 among non-syntenic SNPs. All three measures of LD for syntenic pairs decline with distance; the decline is much steeper for r2 than for D' and Dvol. The value of D' and Dvol are quite similar. Significant LD in cattle extends to 40 kb (when estimated as r2) and 8.2 Mb (when estimated as D'). The mean values for LD at large physical distances are close to those for non-syntenic SNPs. Minor allelic frequency threshold affects the distribution and extent of LD. For unbiased and accurate estimates of LD across marker intervals spanning < 1 kb to > 50 Mb, minimum sample sizes of 400 (for D') and 75 (for r2) are required. The bias due to small samples sizes increases with inter-marker interval. LD in cattle is much less extensive than in a mouse population created from crossing inbred lines, and more extensive than in humans.

Conclusion: For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62). For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.

Show MeSH

Related in: MedlinePlus

Distribution of LD between SNP pairs in relation to the physical distance between loci (Mb), pooled over all autosomes. The red line shows average LD in each 500 kb sliding window. Grey dots are individual LD estimates plotted again inter-marker distances. Figure 1A shows D' and Figure 1B shows r2. The blue line in Figure 1A shows the theoretical distribution from the Malécot model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2386485&req=5

Figure 1: Distribution of LD between SNP pairs in relation to the physical distance between loci (Mb), pooled over all autosomes. The red line shows average LD in each 500 kb sliding window. Grey dots are individual LD estimates plotted again inter-marker distances. Figure 1A shows D' and Figure 1B shows r2. The blue line in Figure 1A shows the theoretical distribution from the Malécot model.

Mentions: Two of the most commonly used measures of LD, D' and r2, were estimated for each pair-wise combination of SNPs on each chromosome (syntenic SNPs): a total of 1,566,890 pairs were analyzed for all autosomes. The mean values of D' and r2 for individual autosomes are presented in Table 1. The mean D' and r2, pooled over autosomes (1–29) in different categories of map distances, are summarized in Table 2. Similar tables for individual chromosomes are provided in Additional file 3. The distribution of D' and r2 with respect to the physical distance separating loci is presented in Figures 1A and 1B, respectively. As expected, there is a gradual decline in D' with increasing physical distance between SNPs: for SNPs up to 1 kb apart, the mean D' is 0.99; for SNPs separated by 200–500 kb the mean D' is 0.46, and for SNPs separated by more than 50 Mb, the mean D' is 0.11. The distribution of expected D' obtained from fitting the Malécot model [34,15] is also shown in Figure 1A. From this distribution, the estimated swept radius (the distance over which LD declines to ~37% of its initial value) is 8.2 Mb.


Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel.

Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, Schnabel RD, Taylor JF, Raadsma HW - BMC Genomics (2008)

Distribution of LD between SNP pairs in relation to the physical distance between loci (Mb), pooled over all autosomes. The red line shows average LD in each 500 kb sliding window. Grey dots are individual LD estimates plotted again inter-marker distances. Figure 1A shows D' and Figure 1B shows r2. The blue line in Figure 1A shows the theoretical distribution from the Malécot model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2386485&req=5

Figure 1: Distribution of LD between SNP pairs in relation to the physical distance between loci (Mb), pooled over all autosomes. The red line shows average LD in each 500 kb sliding window. Grey dots are individual LD estimates plotted again inter-marker distances. Figure 1A shows D' and Figure 1B shows r2. The blue line in Figure 1A shows the theoretical distribution from the Malécot model.
Mentions: Two of the most commonly used measures of LD, D' and r2, were estimated for each pair-wise combination of SNPs on each chromosome (syntenic SNPs): a total of 1,566,890 pairs were analyzed for all autosomes. The mean values of D' and r2 for individual autosomes are presented in Table 1. The mean D' and r2, pooled over autosomes (1–29) in different categories of map distances, are summarized in Table 2. Similar tables for individual chromosomes are provided in Additional file 3. The distribution of D' and r2 with respect to the physical distance separating loci is presented in Figures 1A and 1B, respectively. As expected, there is a gradual decline in D' with increasing physical distance between SNPs: for SNPs up to 1 kb apart, the mean D' is 0.99; for SNPs separated by 200–500 kb the mean D' is 0.46, and for SNPs separated by more than 50 Mb, the mean D' is 0.11. The distribution of expected D' obtained from fitting the Malécot model [34,15] is also shown in Figure 1A. From this distribution, the estimated swept radius (the distance over which LD declines to ~37% of its initial value) is 8.2 Mb.

Bottom Line: The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection.For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62).For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Centre for Advanced Technologies in Animal Genetics and Reproduction (ReproGen), University of Sydney, Camden, NSW 2570, Australia. M.Khatkar@usyd.edu.au

ABSTRACT

Background: The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection. Most studies on LD in cattle reported to date are based on microsatellite markers or small numbers of single nucleotide polymorphisms (SNPs) covering one or only a few chromosomes. This is the first comprehensive study on the extent of LD in cattle by analyzing data on 1,546 Holstein-Friesian bulls genotyped for 15,036 SNP markers covering all regions of all autosomes. Furthermore, most studies in cattle have used relatively small sample sizes and, consequently, may have had biased estimates of measures commonly used to describe LD. We examine minimum sample sizes required to estimate LD without bias and loss in accuracy. Finally, relatively little information is available on comparative LD structures including other mammalian species such as human and mouse, and we compare LD structure in cattle with public-domain data from both human and mouse.

Results: We computed three LD estimates, D', Dvol and r2, for 1,566,890 syntenic SNP pairs and a sample of 365,400 non-syntenic pairs. Mean D' is 0.189 among syntenic SNPs, and 0.105 among non-syntenic SNPs; mean r2 is 0.024 among syntenic SNPs and 0.0032 among non-syntenic SNPs. All three measures of LD for syntenic pairs decline with distance; the decline is much steeper for r2 than for D' and Dvol. The value of D' and Dvol are quite similar. Significant LD in cattle extends to 40 kb (when estimated as r2) and 8.2 Mb (when estimated as D'). The mean values for LD at large physical distances are close to those for non-syntenic SNPs. Minor allelic frequency threshold affects the distribution and extent of LD. For unbiased and accurate estimates of LD across marker intervals spanning < 1 kb to > 50 Mb, minimum sample sizes of 400 (for D') and 75 (for r2) are required. The bias due to small samples sizes increases with inter-marker interval. LD in cattle is much less extensive than in a mouse population created from crossing inbred lines, and more extensive than in humans.

Conclusion: For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62). For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.

Show MeSH
Related in: MedlinePlus