Limits...
Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

Li S, Jakobsson M - BMC Genet. (2012)

Bottom Line: We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data.Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC.We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.

ABSTRACT

Background: The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.

Results: We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.

Conclusions: We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

Show MeSH

Related in: MedlinePlus

Influence of the tolerance level. The mean (across 49 choices of T) difference between the true and the estimated divergence time T as a function of the tolerance level (blue stars for using regression and red stars for using rejection only). The mean (across 49 choices of T) width of the 95% credible interval for the estimated T as a function of the tolerance level (blue filled circles for using regression and red filled circles for using rejection only). For comparison, fitted lines are included for the results of ABC with local linear regression.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3368717&req=5

Figure 6: Influence of the tolerance level. The mean (across 49 choices of T) difference between the true and the estimated divergence time T as a function of the tolerance level (blue stars for using regression and red stars for using rejection only). The mean (across 49 choices of T) width of the 95% credible interval for the estimated T as a function of the tolerance level (blue filled circles for using regression and red filled circles for using rejection only). For comparison, fitted lines are included for the results of ABC with local linear regression.

Mentions: We also investigated a range of tolerance levels to determine its impact on the accuracy of the parameter estimation. For each of the 49 "observed" datasets where the true T ranged from 0.01 to 0.49, we varied the tolerance level from 0.2% to 10%. For each tolerance level, the mean (across the 49 choices of the true value of T) difference between the true and the estimated T (mean of the posterior sample) was computed (Figure 6). The difference between the true and the estimated T decreased as the tolerance decreased (Pearson correlation: 0.61, p < 10-10). Furthermore, the width of the 95% credible region also decreased with decreasing tolerance levels (Pearson correlation: 0.82, p < 10-24, Figure 6). For comparison, if the use a standard rejection algorithm instead of ABC with local linear regression, the difference between the true and the estimated T turned out to be very similar, but the width of the 95% credible region was slightly smaller when using local linear regression. Hence, as long as the number of accepted replicate simulations was reasonable - in this case a hundred or greater - the parameter estimation using ABC benefits from a low tolerance levels.


Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation.

Li S, Jakobsson M - BMC Genet. (2012)

Influence of the tolerance level. The mean (across 49 choices of T) difference between the true and the estimated divergence time T as a function of the tolerance level (blue stars for using regression and red stars for using rejection only). The mean (across 49 choices of T) width of the 95% credible interval for the estimated T as a function of the tolerance level (blue filled circles for using regression and red filled circles for using rejection only). For comparison, fitted lines are included for the results of ABC with local linear regression.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3368717&req=5

Figure 6: Influence of the tolerance level. The mean (across 49 choices of T) difference between the true and the estimated divergence time T as a function of the tolerance level (blue stars for using regression and red stars for using rejection only). The mean (across 49 choices of T) width of the 95% credible interval for the estimated T as a function of the tolerance level (blue filled circles for using regression and red filled circles for using rejection only). For comparison, fitted lines are included for the results of ABC with local linear regression.
Mentions: We also investigated a range of tolerance levels to determine its impact on the accuracy of the parameter estimation. For each of the 49 "observed" datasets where the true T ranged from 0.01 to 0.49, we varied the tolerance level from 0.2% to 10%. For each tolerance level, the mean (across the 49 choices of the true value of T) difference between the true and the estimated T (mean of the posterior sample) was computed (Figure 6). The difference between the true and the estimated T decreased as the tolerance decreased (Pearson correlation: 0.61, p < 10-10). Furthermore, the width of the 95% credible region also decreased with decreasing tolerance levels (Pearson correlation: 0.82, p < 10-24, Figure 6). For comparison, if the use a standard rejection algorithm instead of ABC with local linear regression, the difference between the true and the estimated T turned out to be very similar, but the width of the 95% credible region was slightly smaller when using local linear regression. Hence, as long as the number of accepted replicate simulations was reasonable - in this case a hundred or greater - the parameter estimation using ABC benefits from a low tolerance levels.

Bottom Line: We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data.Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC.We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Evolutionary Biology, EBC, Uppsala University, Norbyvägen 18D, Uppsala SE-75236, Sweden.

ABSTRACT

Background: The Approximate Bayesian Computation (ABC) approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years.

Results: We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC.

Conclusions: We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from these data, suggesting that the ABC approach will be a useful tool for analyzing large genome-wide datasets.

Show MeSH
Related in: MedlinePlus