Limits...
Bayesian inference of species trees from multilocus data.

Heled J, Drummond AJ - Mol. Biol. Evol. (2009)

Bottom Line: Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species.Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method.We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Auckland, New Zealand. jheled@gmail.com

ABSTRACT
Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With technological advances, it is now becoming more common to collect data sets containing multiple gene loci and multiple individuals per species. These data sets often reveal the need to directly model intraspecies polymorphism and incomplete lineage sorting in phylogenetic estimation procedures. For a single species, coalescent theory is widely used in contemporary population genetics to model intraspecific gene trees. Here, we present a Bayesian Markov chain Monte Carlo method for the multispecies coalescent. Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species. The inference is made possible by multilocus data from multiple individuals per species. Using a multiindividual data set and a series of simulations of rapid species radiations, we demonstrate the efficacy of our new method. These simulations give some insight into the behavior of the method as a function of sampled individuals, sampled loci, and sequence length. Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method. We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.

Show MeSH
(a) Species tree estimation error and (b) 95% credible interval size as a function of the number of loci. The number of individuals sampled per species is four for all experiments. Each graph point is obtained by averaging the error measure (described in the main text) over 100 analyses of simulated data sets. The “branch score” is a measure of the distance in tree space of the estimated species tree to the true tree, incorporating both topology and divergence times. The “tree score” is a measure of the distance between the estimated species tree and the true species tree incorporating information about the population size as well. For details of the tree metrics used, see main text.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2822290&req=5

fig3: (a) Species tree estimation error and (b) 95% credible interval size as a function of the number of loci. The number of individuals sampled per species is four for all experiments. Each graph point is obtained by averaging the error measure (described in the main text) over 100 analyses of simulated data sets. The “branch score” is a measure of the distance in tree space of the estimated species tree to the true tree, incorporating both topology and divergence times. The “tree score” is a measure of the distance between the estimated species tree and the true species tree incorporating information about the population size as well. For details of the tree metrics used, see main text.

Mentions: We have attempted to define some performance measures that make the most of the simulations carried out. Figure 3 shows the averages of several measures computed from the posterior distribution of species trees; each graph point was obtained by averaging results from 100 runs, where each run was produced by the Bayesian analysis of a single simulated data set. Figure 3a plots two error measures, whereas figure 3b shows the mean number of species tree topologies in the 95% credible interval.


Bayesian inference of species trees from multilocus data.

Heled J, Drummond AJ - Mol. Biol. Evol. (2009)

(a) Species tree estimation error and (b) 95% credible interval size as a function of the number of loci. The number of individuals sampled per species is four for all experiments. Each graph point is obtained by averaging the error measure (described in the main text) over 100 analyses of simulated data sets. The “branch score” is a measure of the distance in tree space of the estimated species tree to the true tree, incorporating both topology and divergence times. The “tree score” is a measure of the distance between the estimated species tree and the true species tree incorporating information about the population size as well. For details of the tree metrics used, see main text.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2822290&req=5

fig3: (a) Species tree estimation error and (b) 95% credible interval size as a function of the number of loci. The number of individuals sampled per species is four for all experiments. Each graph point is obtained by averaging the error measure (described in the main text) over 100 analyses of simulated data sets. The “branch score” is a measure of the distance in tree space of the estimated species tree to the true tree, incorporating both topology and divergence times. The “tree score” is a measure of the distance between the estimated species tree and the true species tree incorporating information about the population size as well. For details of the tree metrics used, see main text.
Mentions: We have attempted to define some performance measures that make the most of the simulations carried out. Figure 3 shows the averages of several measures computed from the posterior distribution of species trees; each graph point was obtained by averaging results from 100 runs, where each run was produced by the Bayesian analysis of a single simulated data set. Figure 3a plots two error measures, whereas figure 3b shows the mean number of species tree topologies in the 95% credible interval.

Bottom Line: Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species.Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method.We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Auckland, New Zealand. jheled@gmail.com

ABSTRACT
Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With technological advances, it is now becoming more common to collect data sets containing multiple gene loci and multiple individuals per species. These data sets often reveal the need to directly model intraspecies polymorphism and incomplete lineage sorting in phylogenetic estimation procedures. For a single species, coalescent theory is widely used in contemporary population genetics to model intraspecific gene trees. Here, we present a Bayesian Markov chain Monte Carlo method for the multispecies coalescent. Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species. The inference is made possible by multilocus data from multiple individuals per species. Using a multiindividual data set and a series of simulations of rapid species radiations, we demonstrate the efficacy of our new method. These simulations give some insight into the behavior of the method as a function of sampled individuals, sampled loci, and sequence length. Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method. We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.

Show MeSH