Limits...
An experimental study of Quartets MaxCut and other supertree methods.

Swenson MS, Suri R, Linder CR, Warnow T - Algorithms Mol Biol (2011)

Bottom Line: We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled.Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods.Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, The University of Texas at Austin, Austin TX, USA. mswenson@cs.utexas.edu.

ABSTRACT

Background: Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.

Results: We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.

Conclusions: Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

No MeSH data available.


Scaffold density vs. supertree method FN rate on all-scaffold data. Topological error rates on 100- and 500-taxon all-scaffold datasets. We report False Negative (FN) rates (means with standard error bars) for QMC(Exp+TSQ) and gMRP as a function of the scaffold density.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3101644&req=5

Figure 4: Scaffold density vs. supertree method FN rate on all-scaffold data. Topological error rates on 100- and 500-taxon all-scaffold datasets. We report False Negative (FN) rates (means with standard error bars) for QMC(Exp+TSQ) and gMRP as a function of the scaffold density.

Mentions: Supertree studies differ not only in the methods used to combine source trees into a tree on the full set of taxa, but also in how the source tree datasets are produced, and in particular how densely sampled these source trees are. On datasets that have only one scaffold, the accuracy of all supertree methods suffer as the density of the scaffold decreases, a trend that was also observed by Swenson et al. [23] (see Figures 1, 2, 3). Figure 4 shows the results of an experiment in which we sought to evaluate the impact of the density of taxon sampling within source trees on the accuracy of the produced supertree for 100- and 500-taxon all-scaffold datasets; we did not generate 1000-taxon all-scaffold datasets, and therefore did not analyze such datasets using any supertree methods, due to the running time required to estimate dense scaffolds for such datasets. We compared the topological accuracy of supertrees estimated on all-scaffold datasets with those from mixed-datasets (datasets having one scaffold source tree with the remaining source trees being clade-based).


An experimental study of Quartets MaxCut and other supertree methods.

Swenson MS, Suri R, Linder CR, Warnow T - Algorithms Mol Biol (2011)

Scaffold density vs. supertree method FN rate on all-scaffold data. Topological error rates on 100- and 500-taxon all-scaffold datasets. We report False Negative (FN) rates (means with standard error bars) for QMC(Exp+TSQ) and gMRP as a function of the scaffold density.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3101644&req=5

Figure 4: Scaffold density vs. supertree method FN rate on all-scaffold data. Topological error rates on 100- and 500-taxon all-scaffold datasets. We report False Negative (FN) rates (means with standard error bars) for QMC(Exp+TSQ) and gMRP as a function of the scaffold density.
Mentions: Supertree studies differ not only in the methods used to combine source trees into a tree on the full set of taxa, but also in how the source tree datasets are produced, and in particular how densely sampled these source trees are. On datasets that have only one scaffold, the accuracy of all supertree methods suffer as the density of the scaffold decreases, a trend that was also observed by Swenson et al. [23] (see Figures 1, 2, 3). Figure 4 shows the results of an experiment in which we sought to evaluate the impact of the density of taxon sampling within source trees on the accuracy of the produced supertree for 100- and 500-taxon all-scaffold datasets; we did not generate 1000-taxon all-scaffold datasets, and therefore did not analyze such datasets using any supertree methods, due to the running time required to estimate dense scaffolds for such datasets. We compared the topological accuracy of supertrees estimated on all-scaffold datasets with those from mixed-datasets (datasets having one scaffold source tree with the remaining source trees being clade-based).

Bottom Line: We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled.Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods.Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, The University of Texas at Austin, Austin TX, USA. mswenson@cs.utexas.edu.

ABSTRACT

Background: Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.

Results: We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.

Conclusions: Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

No MeSH data available.