Limits...
Phylogenetic inference via sequential Monte Carlo.

Bouchard-Côté A, Sankararaman S, Jordan MI - Syst. Biol. (2012)

Bottom Line: We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data.The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence.We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z2, Canada.

ABSTRACT
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

Show MeSH
To illustrate how PosetSMC sequentially samples from the space of trees, we present a subset of the Hasse diagram induced by the naive proposal described in the Examples section. Note that this diagram is not a phylogenetic tree: The circles correspond to partial states (phylogenetic forests), organized in rows ordered by their rank ρ, and edges denote a positive transition density between pairs of partial states. The forests are labeled by the union of the sets of nontrivial rooted clades over the trees in the forest. The dashed lines correspond to the proposal moves forbidden by the strict height increase condition (Assumption 2(b) in the text). Note that we show only a subset of the Hasse graph since the branch lengths make the graph infinite. The subset shown here is based on an intersection of height function fibers: Given a subset of the leaves , we define the height function  as the height of the most recent common ancestor of  in s, if  is a clade in one of the trees in s, and ω otherwise, where ω ∉ ℝ. Given a map , the subset of the vertices of the Hasse diagram shown is given by . The graph shown here corresponds to any f such that f({A,B}) < f({C,D}), f({A,C})<f({B,D}), and f({A,D})<f({B,C}).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3376373&req=5

fig2: To illustrate how PosetSMC sequentially samples from the space of trees, we present a subset of the Hasse diagram induced by the naive proposal described in the Examples section. Note that this diagram is not a phylogenetic tree: The circles correspond to partial states (phylogenetic forests), organized in rows ordered by their rank ρ, and edges denote a positive transition density between pairs of partial states. The forests are labeled by the union of the sets of nontrivial rooted clades over the trees in the forest. The dashed lines correspond to the proposal moves forbidden by the strict height increase condition (Assumption 2(b) in the text). Note that we show only a subset of the Hasse graph since the branch lengths make the graph infinite. The subset shown here is based on an intersection of height function fibers: Given a subset of the leaves , we define the height function as the height of the most recent common ancestor of in s, if is a clade in one of the trees in s, and ω otherwise, where ω ∉ ℝ. Given a map , the subset of the vertices of the Hasse diagram shown is given by . The graph shown here corresponds to any f such that f({A,B}) < f({C,D}), f({A,C})<f({B,D}), and f({A,D})<f({B,C}).

Mentions: There are (2n−3)!! =1×3 × 5 = 15 rooted topologies on four taxa, and in Figure 2, we show schematically a subset of the Hasse diagram of the particle paths that lead to these trees under the naive proposal described above. It is clear from the figure that a balanced binary tree on four taxa, with rooted clades {A,B},{C,D}, can be proposed in two different ways: either by first merging A and B, then C and D, or by merging C and D, then A and B. On the other hand, an unbalanced tree on the same set of taxa can be proposed in a single way, for example, {A,B} → {A,B,C}. As a consequence, if we assume there are no observations at the leaves, the expected fraction of particles with a caterpillar topology of each type is 1/18 (because there are 18 distinct paths in Fig. 2), whereas the expected fraction of particles with a balanced topology of each type is 2/18 =1/9. However, since we have assumed there are no observations, the posterior should be equal to the prior, a uniform distribution. Therefore, the naive proposal leads to a biased approximate posterior.FIGURE 2.


Phylogenetic inference via sequential Monte Carlo.

Bouchard-Côté A, Sankararaman S, Jordan MI - Syst. Biol. (2012)

To illustrate how PosetSMC sequentially samples from the space of trees, we present a subset of the Hasse diagram induced by the naive proposal described in the Examples section. Note that this diagram is not a phylogenetic tree: The circles correspond to partial states (phylogenetic forests), organized in rows ordered by their rank ρ, and edges denote a positive transition density between pairs of partial states. The forests are labeled by the union of the sets of nontrivial rooted clades over the trees in the forest. The dashed lines correspond to the proposal moves forbidden by the strict height increase condition (Assumption 2(b) in the text). Note that we show only a subset of the Hasse graph since the branch lengths make the graph infinite. The subset shown here is based on an intersection of height function fibers: Given a subset of the leaves , we define the height function  as the height of the most recent common ancestor of  in s, if  is a clade in one of the trees in s, and ω otherwise, where ω ∉ ℝ. Given a map , the subset of the vertices of the Hasse diagram shown is given by . The graph shown here corresponds to any f such that f({A,B}) < f({C,D}), f({A,C})<f({B,D}), and f({A,D})<f({B,C}).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3376373&req=5

fig2: To illustrate how PosetSMC sequentially samples from the space of trees, we present a subset of the Hasse diagram induced by the naive proposal described in the Examples section. Note that this diagram is not a phylogenetic tree: The circles correspond to partial states (phylogenetic forests), organized in rows ordered by their rank ρ, and edges denote a positive transition density between pairs of partial states. The forests are labeled by the union of the sets of nontrivial rooted clades over the trees in the forest. The dashed lines correspond to the proposal moves forbidden by the strict height increase condition (Assumption 2(b) in the text). Note that we show only a subset of the Hasse graph since the branch lengths make the graph infinite. The subset shown here is based on an intersection of height function fibers: Given a subset of the leaves , we define the height function as the height of the most recent common ancestor of in s, if is a clade in one of the trees in s, and ω otherwise, where ω ∉ ℝ. Given a map , the subset of the vertices of the Hasse diagram shown is given by . The graph shown here corresponds to any f such that f({A,B}) < f({C,D}), f({A,C})<f({B,D}), and f({A,D})<f({B,C}).
Mentions: There are (2n−3)!! =1×3 × 5 = 15 rooted topologies on four taxa, and in Figure 2, we show schematically a subset of the Hasse diagram of the particle paths that lead to these trees under the naive proposal described above. It is clear from the figure that a balanced binary tree on four taxa, with rooted clades {A,B},{C,D}, can be proposed in two different ways: either by first merging A and B, then C and D, or by merging C and D, then A and B. On the other hand, an unbalanced tree on the same set of taxa can be proposed in a single way, for example, {A,B} → {A,B,C}. As a consequence, if we assume there are no observations at the leaves, the expected fraction of particles with a caterpillar topology of each type is 1/18 (because there are 18 distinct paths in Fig. 2), whereas the expected fraction of particles with a balanced topology of each type is 2/18 =1/9. However, since we have assumed there are no observations, the posterior should be equal to the prior, a uniform distribution. Therefore, the naive proposal leads to a biased approximate posterior.FIGURE 2.

Bottom Line: We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data.The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence.We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z2, Canada.

ABSTRACT
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.

Show MeSH