Limits...
Bayesian species delimitation can be robust to guide-tree inference errors.

Zhang C, Rannala B, Yang Z - Syst. Biol. (2014)

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden; Genome Center and Section of Evolution and Ecology, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA; Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK; and Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The Bayesian method uses Bayesian model selection to compare different species-delimitation models in the multispecies coalescent framework, and uses reversible-jump Markov chain Monte Carlo (rjMCMC) to estimate the posterior probabilities for different delimitation models... The method accommodates multiple loci, and does not require reciprocal monophyly of inferred gene trees... The three or five sequences from each of the eight populations were constrained to be monophyletic... The substitution model used was GTR, since RAxML does not implement the JC69 model... Also, RAxML does not implement the molecular clock and infers unrooted trees instead... Given the guide tree, the nuclear sequence data (either one locus or five loci) simulated above were analyzed using bpp version 2.2 to delimit species... Note that we used only the population tree topology inferred by the two methods (RAxML/beast and *beast), and ignored any support measures for clades on the tree, such as the bootstrap support values calculated by RAxML and the posterior clade probabilities calculated by *beast... The results show clear effects of the species phylogeny (in particular, the lengths of the internal branches reflecting species divergence times), the mutation rate, and the number of loci... The better performance of bpp for the large sample size appears to be largely due to the increased information content for species delimitation since the improvement in guide-tree inference is moderate... A previous simulation found that increasing the number of sequences sampled from the same species improves species delimitation by bpp, leading to both reduction of false positives (over-splitting errors) and increase of power (correctly delimiting distinct species)... In our simulation, we assumed no gene flow (migration, hybridization, or introgression) after species divergence, and conflicts between gene trees from different genomic regions or between mitochondrial and nuclear loci are entirely due to ancestral polymorphism and incomplete lineage sorting... Although the results suggest that a few loci of sequence data are insufficient for structurama to assign individuals to populations reliably, the impact of assignment errors on species delimitation by bpp under more realistic scenarios remains unknown.

Show MeSH
Histogram of posterior probabilities for splitting clades into different species by bpp in data of one locus, with three sequences sampled from each population at the locus, simulated using tree 1 at the low-mutation rate, when the guide tree was inferred using *beast. Each bin is of size 0.05. The frequencies in the last bin for splitting clades , and  are the false-positive rates listed in Table 2.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4195854&req=5

Figure 4: Histogram of posterior probabilities for splitting clades into different species by bpp in data of one locus, with three sequences sampled from each population at the locus, simulated using tree 1 at the low-mutation rate, when the guide tree was inferred using *beast. Each bin is of size 0.05. The frequencies in the last bin for splitting clades , and are the false-positive rates listed in Table 2.

Mentions: However, the false-positive rates in those simulations are overall quite low. In all cases except one, the false-positive rates were near or below the nominal rate of 5%. The exception is the case of *beast + bpp analysis of one nuclear locus at the low rate for species tree 1, in which bpp splits clades and in approximately 8% of replicates, slightly above the nominal 5%. In this case, phylogenetic errors in the guide tree inferred by *beast are very common, with clades and recovered in only 77% of the replicates (Fig. 2a). To understand why such high errors in the guide-tree inference did not lead to very high false positives in bpp species delimitation, we plot in Figures 4 and 5 the distributions (histograms) of posterior probabilities calculated by bpp (see also Tables 4 and 5 for the medians and quartiles, and online supplementary Figs. S1–S16 for other cases). With one locus (Fig. 4), the posterior probabilities for splitting clades and are spread-out. With five loci (Fig. 5), they shift towards 0 and become highly concentrated. Thus, in the data of a single nuclear locus, the posterior probabilities calculated by bpp did not often reach the 95% cut-off due to the lack of information. With more loci or at the higher mutation rate, the data become far more informative and the posterior probabilities become more extreme. However, in such cases, the guide tree tends to be correctly reconstructed (Fig. 2a) and bpp becomes increasingly accurate with lower rates of false positives and false negatives (Table 2).


Bayesian species delimitation can be robust to guide-tree inference errors.

Zhang C, Rannala B, Yang Z - Syst. Biol. (2014)

Histogram of posterior probabilities for splitting clades into different species by bpp in data of one locus, with three sequences sampled from each population at the locus, simulated using tree 1 at the low-mutation rate, when the guide tree was inferred using *beast. Each bin is of size 0.05. The frequencies in the last bin for splitting clades , and  are the false-positive rates listed in Table 2.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4195854&req=5

Figure 4: Histogram of posterior probabilities for splitting clades into different species by bpp in data of one locus, with three sequences sampled from each population at the locus, simulated using tree 1 at the low-mutation rate, when the guide tree was inferred using *beast. Each bin is of size 0.05. The frequencies in the last bin for splitting clades , and are the false-positive rates listed in Table 2.
Mentions: However, the false-positive rates in those simulations are overall quite low. In all cases except one, the false-positive rates were near or below the nominal rate of 5%. The exception is the case of *beast + bpp analysis of one nuclear locus at the low rate for species tree 1, in which bpp splits clades and in approximately 8% of replicates, slightly above the nominal 5%. In this case, phylogenetic errors in the guide tree inferred by *beast are very common, with clades and recovered in only 77% of the replicates (Fig. 2a). To understand why such high errors in the guide-tree inference did not lead to very high false positives in bpp species delimitation, we plot in Figures 4 and 5 the distributions (histograms) of posterior probabilities calculated by bpp (see also Tables 4 and 5 for the medians and quartiles, and online supplementary Figs. S1–S16 for other cases). With one locus (Fig. 4), the posterior probabilities for splitting clades and are spread-out. With five loci (Fig. 5), they shift towards 0 and become highly concentrated. Thus, in the data of a single nuclear locus, the posterior probabilities calculated by bpp did not often reach the 95% cut-off due to the lack of information. With more loci or at the higher mutation rate, the data become far more informative and the posterior probabilities become more extreme. However, in such cases, the guide tree tends to be correctly reconstructed (Fig. 2a) and bpp becomes increasingly accurate with lower rates of false positives and false negatives (Table 2).

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden; Genome Center and Section of Evolution and Ecology, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA; Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK; and Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The Bayesian method uses Bayesian model selection to compare different species-delimitation models in the multispecies coalescent framework, and uses reversible-jump Markov chain Monte Carlo (rjMCMC) to estimate the posterior probabilities for different delimitation models... The method accommodates multiple loci, and does not require reciprocal monophyly of inferred gene trees... The three or five sequences from each of the eight populations were constrained to be monophyletic... The substitution model used was GTR, since RAxML does not implement the JC69 model... Also, RAxML does not implement the molecular clock and infers unrooted trees instead... Given the guide tree, the nuclear sequence data (either one locus or five loci) simulated above were analyzed using bpp version 2.2 to delimit species... Note that we used only the population tree topology inferred by the two methods (RAxML/beast and *beast), and ignored any support measures for clades on the tree, such as the bootstrap support values calculated by RAxML and the posterior clade probabilities calculated by *beast... The results show clear effects of the species phylogeny (in particular, the lengths of the internal branches reflecting species divergence times), the mutation rate, and the number of loci... The better performance of bpp for the large sample size appears to be largely due to the increased information content for species delimitation since the improvement in guide-tree inference is moderate... A previous simulation found that increasing the number of sequences sampled from the same species improves species delimitation by bpp, leading to both reduction of false positives (over-splitting errors) and increase of power (correctly delimiting distinct species)... In our simulation, we assumed no gene flow (migration, hybridization, or introgression) after species divergence, and conflicts between gene trees from different genomic regions or between mitochondrial and nuclear loci are entirely due to ancestral polymorphism and incomplete lineage sorting... Although the results suggest that a few loci of sequence data are insufficient for structurama to assign individuals to populations reliably, the impact of assignment errors on species delimitation by bpp under more realistic scenarios remains unknown.

Show MeSH