Limits...
Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

Li S, Jakobsson M - BMC Genet. (2012)

Bottom Line: We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data.Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC.We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.

ABSTRACT

Background: The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.

Results: We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.

Conclusions: We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

Show MeSH

Related in: MedlinePlus

Performance of the ABC with local linear regression, for estimating past and present population sizes, migration rate, and divergence time. The red stars indicate the means of the posterior sample and blue lines give the 95% credible intervals. The dashed black line shows the linear fit to the means and the gray solid line shows the ideal case, when the estimate equals the true value.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368717&req=5

Figure 5: Performance of the ABC with local linear regression, for estimating past and present population sizes, migration rate, and divergence time. The red stars indicate the means of the posterior sample and blue lines give the 95% credible intervals. The dashed black line shows the linear fit to the means and the gray solid line shows the ideal case, when the estimate equals the true value.

Mentions: We generated 195 "observed" datasets for model 3 where the true past population size N1' ranged from 200 to 9,800, the true current population size N1 ranged from 11,000 to 198,000, the true migration rate m21 ranged from 0.1 to 4.9, and the true divergence time T ranged from 0.01 to 0.49 (the other parameters were set to the same values as above, see Table 4). These datasets were generated to investigate to what extent the observations from single "observed" datasets generalize to a wide range of true values for various parameters and multiple instances of estimating parameters using ABC. In most of cases, the ABC with local linear regression adjustment estimated all four parameters satisfactory (Figure 5), including the difficult current population size and the migration rate, albeit that the credible intervals were large. For the past population size and the divergence time, there were a few exceptional cases where the 95% credible intervals extended over almost the entire range of the prior (e.g. T = 0.26 and N1' = 5,800). For these cases, the set of summary statistics for accepted parameter-values included one or more extreme outlier, which in turn caused the local linear regression to produce a wide range of adjusted parameter-values since the normal least-square estimation for the regression model is non-robust to outliers [38].


Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

Li S, Jakobsson M - BMC Genet. (2012)

Performance of the ABC with local linear regression, for estimating past and present population sizes, migration rate, and divergence time. The red stars indicate the means of the posterior sample and blue lines give the 95% credible intervals. The dashed black line shows the linear fit to the means and the gray solid line shows the ideal case, when the estimate equals the true value.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368717&req=5

Figure 5: Performance of the ABC with local linear regression, for estimating past and present population sizes, migration rate, and divergence time. The red stars indicate the means of the posterior sample and blue lines give the 95% credible intervals. The dashed black line shows the linear fit to the means and the gray solid line shows the ideal case, when the estimate equals the true value.
Mentions: We generated 195 "observed" datasets for model 3 where the true past population size N1' ranged from 200 to 9,800, the true current population size N1 ranged from 11,000 to 198,000, the true migration rate m21 ranged from 0.1 to 4.9, and the true divergence time T ranged from 0.01 to 0.49 (the other parameters were set to the same values as above, see Table 4). These datasets were generated to investigate to what extent the observations from single "observed" datasets generalize to a wide range of true values for various parameters and multiple instances of estimating parameters using ABC. In most of cases, the ABC with local linear regression adjustment estimated all four parameters satisfactory (Figure 5), including the difficult current population size and the migration rate, albeit that the credible intervals were large. For the past population size and the divergence time, there were a few exceptional cases where the 95% credible intervals extended over almost the entire range of the prior (e.g. T = 0.26 and N1' = 5,800). For these cases, the set of summary statistics for accepted parameter-values included one or more extreme outlier, which in turn caused the local linear regression to produce a wide range of adjusted parameter-values since the normal least-square estimation for the regression model is non-robust to outliers [38].

Bottom Line: We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data.Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC.We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.

ABSTRACT

Background: The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.

Results: We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.

Conclusions: We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

Show MeSH
Related in: MedlinePlus