Limits...
Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

Li S, Jakobsson M - BMC Genet. (2012)

Bottom Line: We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data.Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC.We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.

ABSTRACT

Background: The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.

Results: We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.

Conclusions: We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

Show MeSH
Population models. A) Model 1: a simple divergence model, with two sub-populations that have constant sizes (N1 and N2). Migration occur after divergence event (at time T) with rate m12 and m21; B), model 2: a divergence model with exponential growth. After the divergence time T, two sub-populations (of size N1' and N2') grow with exponential rates α1 and α2, and the population sizes at present are N1 and N2. There is no migration between the sub-populations; C) model 3: Composite model of model 1 and model 2. The same as model 2, but with migration occurring at rate m12 and m21 after divergence time.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368717&req=5

Figure 1: Population models. A) Model 1: a simple divergence model, with two sub-populations that have constant sizes (N1 and N2). Migration occur after divergence event (at time T) with rate m12 and m21; B), model 2: a divergence model with exponential growth. After the divergence time T, two sub-populations (of size N1' and N2') grow with exponential rates α1 and α2, and the population sizes at present are N1 and N2. There is no migration between the sub-populations; C) model 3: Composite model of model 1 and model 2. The same as model 2, but with migration occurring at rate m12 and m21 after divergence time.

Mentions: We investigate three different population divergence models. These models were chosen to be similar to commonly studied population models, such as the 'isolation with migration' model, and to represent an increasing complexity. In the first model (Figure 1A), an ancestral population with size NA was split into two sub-populations (population 1 and population 2) at time T before present (scaled by 4Ne generations, where Ne = 10,000). Sub-populations had a constant size of N1 = Ne and N2 = 0.5Ne, respectively (where NA = N1 + N2). Migration between the two sub-populations occurred at rate m12 (from population 1 to population 2) and m21 (from population 2 to population 1), where the migration rate m = 4NeM and M is the fraction of migrants per generation. In this model, we treated the parameters T, m12 and m21 as unknown and we attempted to infer their values based on simulated genetic data. The sub-population sizes N1 and N2 were assumed to be known for this case.


Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

Li S, Jakobsson M - BMC Genet. (2012)

Population models. A) Model 1: a simple divergence model, with two sub-populations that have constant sizes (N1 and N2). Migration occur after divergence event (at time T) with rate m12 and m21; B), model 2: a divergence model with exponential growth. After the divergence time T, two sub-populations (of size N1' and N2') grow with exponential rates α1 and α2, and the population sizes at present are N1 and N2. There is no migration between the sub-populations; C) model 3: Composite model of model 1 and model 2. The same as model 2, but with migration occurring at rate m12 and m21 after divergence time.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368717&req=5

Figure 1: Population models. A) Model 1: a simple divergence model, with two sub-populations that have constant sizes (N1 and N2). Migration occur after divergence event (at time T) with rate m12 and m21; B), model 2: a divergence model with exponential growth. After the divergence time T, two sub-populations (of size N1' and N2') grow with exponential rates α1 and α2, and the population sizes at present are N1 and N2. There is no migration between the sub-populations; C) model 3: Composite model of model 1 and model 2. The same as model 2, but with migration occurring at rate m12 and m21 after divergence time.
Mentions: We investigate three different population divergence models. These models were chosen to be similar to commonly studied population models, such as the 'isolation with migration' model, and to represent an increasing complexity. In the first model (Figure 1A), an ancestral population with size NA was split into two sub-populations (population 1 and population 2) at time T before present (scaled by 4Ne generations, where Ne = 10,000). Sub-populations had a constant size of N1 = Ne and N2 = 0.5Ne, respectively (where NA = N1 + N2). Migration between the two sub-populations occurred at rate m12 (from population 1 to population 2) and m21 (from population 2 to population 1), where the migration rate m = 4NeM and M is the fraction of migrants per generation. In this model, we treated the parameters T, m12 and m21 as unknown and we attempted to infer their values based on simulated genetic data. The sub-population sizes N1 and N2 were assumed to be known for this case.

Bottom Line: We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data.Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC.We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.

ABSTRACT

Background: The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.

Results: We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.

Conclusions: We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

Show MeSH