Limits...
Correcting the site frequency spectrum for divergence-based ascertainment.

Kern AD - PLoS ONE (2009)

Bottom Line: Comparative genomics based on sequenced referenced genomes is essential to hypothesis generation and testing within population genetics.Here, a method to correct this problem is developed that obtains maximum-likelihood estimates of the unascertained allele frequency distribution using numerical optimization.I show how divergence-based ascertainment may mimic the effects of natural selection and offer correction formulae for performing proper estimation into the strength of selection in candidate regions in a maximum-likelihood setting.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Dartmouth College, Hanover, NH, USA. andrew.d.kern@dartmouth.edu

ABSTRACT
Comparative genomics based on sequenced referenced genomes is essential to hypothesis generation and testing within population genetics. However, selection of candidate regions for further study on the basis of elevated or depressed divergence between species leads to a divergence-based ascertainment bias in the site frequency spectrum within selected candidate loci. Here, a method to correct this problem is developed that obtains maximum-likelihood estimates of the unascertained allele frequency distribution using numerical optimization. I show how divergence-based ascertainment may mimic the effects of natural selection and offer correction formulae for performing proper estimation into the strength of selection in candidate regions in a maximum-likelihood setting.

Show MeSH

Related in: MedlinePlus

ML estimates of the strength of selection with and without divergence ascertainment correction for lower 1% data.The strength of selection, α, was estimated from those loci identified as belonging to the lower 1% divergence group from the 106 simulated regions (see Figure 1 caption for details). The left box shows MLEs using an estimation routine which does not account for divergence-based ascertainment (uncorrected), the right box shows MLEs from the same data but estimated in a fashion which accounts for ascertainment (corrected). Note that uncorrected estimates show spurious evidence for negative selection even though the data were generated from a neutral model.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2666160&req=5

pone-0005152-g004: ML estimates of the strength of selection with and without divergence ascertainment correction for lower 1% data.The strength of selection, α, was estimated from those loci identified as belonging to the lower 1% divergence group from the 106 simulated regions (see Figure 1 caption for details). The left box shows MLEs using an estimation routine which does not account for divergence-based ascertainment (uncorrected), the right box shows MLEs from the same data but estimated in a fashion which accounts for ascertainment (corrected). Note that uncorrected estimates show spurious evidence for negative selection even though the data were generated from a neutral model.

Mentions: Figure 4 shows the effect of divergence-based ascertainment on ML estimates from simulated loci selected for depressed levels of divergence. The mean from uncorrected estimates (i.e. Equation 2) is α̂ = −1.34, indicating evidence of weak negative selection, even though these data have been generated under a standard neutral model. MLEs of α using the divergence-based ascertainment corrected likelihood function (Equation 3), restores the expected value to approximately zero (mean α̂ = 0.001). This ascertainment corrected version of the likelihood function is also useful in a Bayesian setting, for an example see Katzman et al. [14], where it was used in a Bayesian Hierarchical model for estimating distributions of selection coefficients from divergence ascertained data.


Correcting the site frequency spectrum for divergence-based ascertainment.

Kern AD - PLoS ONE (2009)

ML estimates of the strength of selection with and without divergence ascertainment correction for lower 1% data.The strength of selection, α, was estimated from those loci identified as belonging to the lower 1% divergence group from the 106 simulated regions (see Figure 1 caption for details). The left box shows MLEs using an estimation routine which does not account for divergence-based ascertainment (uncorrected), the right box shows MLEs from the same data but estimated in a fashion which accounts for ascertainment (corrected). Note that uncorrected estimates show spurious evidence for negative selection even though the data were generated from a neutral model.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2666160&req=5

pone-0005152-g004: ML estimates of the strength of selection with and without divergence ascertainment correction for lower 1% data.The strength of selection, α, was estimated from those loci identified as belonging to the lower 1% divergence group from the 106 simulated regions (see Figure 1 caption for details). The left box shows MLEs using an estimation routine which does not account for divergence-based ascertainment (uncorrected), the right box shows MLEs from the same data but estimated in a fashion which accounts for ascertainment (corrected). Note that uncorrected estimates show spurious evidence for negative selection even though the data were generated from a neutral model.
Mentions: Figure 4 shows the effect of divergence-based ascertainment on ML estimates from simulated loci selected for depressed levels of divergence. The mean from uncorrected estimates (i.e. Equation 2) is α̂ = −1.34, indicating evidence of weak negative selection, even though these data have been generated under a standard neutral model. MLEs of α using the divergence-based ascertainment corrected likelihood function (Equation 3), restores the expected value to approximately zero (mean α̂ = 0.001). This ascertainment corrected version of the likelihood function is also useful in a Bayesian setting, for an example see Katzman et al. [14], where it was used in a Bayesian Hierarchical model for estimating distributions of selection coefficients from divergence ascertained data.

Bottom Line: Comparative genomics based on sequenced referenced genomes is essential to hypothesis generation and testing within population genetics.Here, a method to correct this problem is developed that obtains maximum-likelihood estimates of the unascertained allele frequency distribution using numerical optimization.I show how divergence-based ascertainment may mimic the effects of natural selection and offer correction formulae for performing proper estimation into the strength of selection in candidate regions in a maximum-likelihood setting.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Dartmouth College, Hanover, NH, USA. andrew.d.kern@dartmouth.edu

ABSTRACT
Comparative genomics based on sequenced referenced genomes is essential to hypothesis generation and testing within population genetics. However, selection of candidate regions for further study on the basis of elevated or depressed divergence between species leads to a divergence-based ascertainment bias in the site frequency spectrum within selected candidate loci. Here, a method to correct this problem is developed that obtains maximum-likelihood estimates of the unascertained allele frequency distribution using numerical optimization. I show how divergence-based ascertainment may mimic the effects of natural selection and offer correction formulae for performing proper estimation into the strength of selection in candidate regions in a maximum-likelihood setting.

Show MeSH
Related in: MedlinePlus