Limits...
Phylogenetic inference via sequential Monte Carlo.

Bouchard-Côté A, Sankararaman S, Jordan MI - Syst. Biol. (2012)

Bottom Line: We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data.The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence.We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z2, Canada.

ABSTRACT
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

Show MeSH
Results on human gene frequency data (Li et al. 2008), comparing the log likelihood of the minimum Bayes risk reconstruction from SMC and MCMC approximations, as a function of the running time (in milliseconds, on a log scale).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376373&req=5

fig10: Results on human gene frequency data (Li et al. 2008), comparing the log likelihood of the minimum Bayes risk reconstruction from SMC and MCMC approximations, as a function of the running time (in milliseconds, on a log scale).

Mentions: We also performed experiments on frequency data from the Human Genome Diversity Panel. In these experiments, we subsampled 11,511 Single Nucleotide Polymorphisms to reduce site correlations, and we used the likelihood model based on Brownian motion described in the previous section. We show the results in Figure 10, using the log likelihood of the consensus tree as an evaluation surrogate. This shows once again the advantages of SMC methods. To give a qualitative idea of what the likelihood gains mean, we show in Figure 11 the consensus tree from 10,000 MCMC iterations versus the consensus tree from 10,000 PosetSMC particles (the circled data points). Since both runs are under sampled, the higher-order groupings are incorrect in both trees, but we can see that more mid- and low-order ethnic/geographic groupings are already captured by SMC. Incidentally, the position of the circled data points show that in practice, K SMC particles are cheaper to compute than K MCMC iterations. This is because fewer memory writes are necessary in the former case.FIGURE 10.


Phylogenetic inference via sequential Monte Carlo.

Bouchard-Côté A, Sankararaman S, Jordan MI - Syst. Biol. (2012)

Results on human gene frequency data (Li et al. 2008), comparing the log likelihood of the minimum Bayes risk reconstruction from SMC and MCMC approximations, as a function of the running time (in milliseconds, on a log scale).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376373&req=5

fig10: Results on human gene frequency data (Li et al. 2008), comparing the log likelihood of the minimum Bayes risk reconstruction from SMC and MCMC approximations, as a function of the running time (in milliseconds, on a log scale).
Mentions: We also performed experiments on frequency data from the Human Genome Diversity Panel. In these experiments, we subsampled 11,511 Single Nucleotide Polymorphisms to reduce site correlations, and we used the likelihood model based on Brownian motion described in the previous section. We show the results in Figure 10, using the log likelihood of the consensus tree as an evaluation surrogate. This shows once again the advantages of SMC methods. To give a qualitative idea of what the likelihood gains mean, we show in Figure 11 the consensus tree from 10,000 MCMC iterations versus the consensus tree from 10,000 PosetSMC particles (the circled data points). Since both runs are under sampled, the higher-order groupings are incorrect in both trees, but we can see that more mid- and low-order ethnic/geographic groupings are already captured by SMC. Incidentally, the position of the circled data points show that in practice, K SMC particles are cheaper to compute than K MCMC iterations. This is because fewer memory writes are necessary in the former case.FIGURE 10.

Bottom Line: We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data.The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence.We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z2, Canada.

ABSTRACT
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

Show MeSH