Limits...
An experimental study of Quartets MaxCut and other supertree methods.

Swenson MS, Suri R, Linder CR, Warnow T - Algorithms Mol Biol (2011)

Bottom Line: We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled.Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods.Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, The University of Texas at Austin, Austin TX, USA. mswenson@cs.utexas.edu.

ABSTRACT

Background: Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.

Results: We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.

Conclusions: Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

No MeSH data available.


Scaffold density vs. supertree method FN rate. False Negative (FN) error rates and error bars of gMRP, SFIT, MinFlip, RFS, PhySIC, Q-Imp, and QMC(Exp+TSQ) on mixed source tree datasets with 100, 500, and 1000 taxa, as a function of the scaffold density. Points are graphed for a method if it had at least ten datasets (or four datasets, for the 1000-taxon model conditions) that completed in common with all other methods.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3101644&req=5

Figure 2: Scaffold density vs. supertree method FN rate. False Negative (FN) error rates and error bars of gMRP, SFIT, MinFlip, RFS, PhySIC, Q-Imp, and QMC(Exp+TSQ) on mixed source tree datasets with 100, 500, and 1000 taxa, as a function of the scaffold density. Points are graphed for a method if it had at least ten datasets (or four datasets, for the 1000-taxon model conditions) that completed in common with all other methods.

Mentions: We report FN rates in Figure 2 (all methods) and Figure 3 (omitting PhySIC and SFIT). All six non-QMC-based supertree methods could be run on the 100-taxon datasets, but some failed to run on the larger datasets. We, therefore, show results for all seven methods on the 100-taxon datasets, but only five methods on the 500-taxon datasets (where SFIT and Q-Imp failed to run, due to computational limitations), and only four methods on the 1000-taxon datasets (where we did not try to run PhySIC, since it had poor topological accuracy and was computationally intensive for the 500-taxon datasets). As noted above, QMC(Exp+TSQ) failed to run on some datasets, so we again only report results for those datasets on which all reported methods were able to run.


An experimental study of Quartets MaxCut and other supertree methods.

Swenson MS, Suri R, Linder CR, Warnow T - Algorithms Mol Biol (2011)

Scaffold density vs. supertree method FN rate. False Negative (FN) error rates and error bars of gMRP, SFIT, MinFlip, RFS, PhySIC, Q-Imp, and QMC(Exp+TSQ) on mixed source tree datasets with 100, 500, and 1000 taxa, as a function of the scaffold density. Points are graphed for a method if it had at least ten datasets (or four datasets, for the 1000-taxon model conditions) that completed in common with all other methods.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3101644&req=5

Figure 2: Scaffold density vs. supertree method FN rate. False Negative (FN) error rates and error bars of gMRP, SFIT, MinFlip, RFS, PhySIC, Q-Imp, and QMC(Exp+TSQ) on mixed source tree datasets with 100, 500, and 1000 taxa, as a function of the scaffold density. Points are graphed for a method if it had at least ten datasets (or four datasets, for the 1000-taxon model conditions) that completed in common with all other methods.
Mentions: We report FN rates in Figure 2 (all methods) and Figure 3 (omitting PhySIC and SFIT). All six non-QMC-based supertree methods could be run on the 100-taxon datasets, but some failed to run on the larger datasets. We, therefore, show results for all seven methods on the 100-taxon datasets, but only five methods on the 500-taxon datasets (where SFIT and Q-Imp failed to run, due to computational limitations), and only four methods on the 1000-taxon datasets (where we did not try to run PhySIC, since it had poor topological accuracy and was computationally intensive for the 500-taxon datasets). As noted above, QMC(Exp+TSQ) failed to run on some datasets, so we again only report results for those datasets on which all reported methods were able to run.

Bottom Line: We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled.Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods.Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, The University of Texas at Austin, Austin TX, USA. mswenson@cs.utexas.edu.

ABSTRACT

Background: Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.

Results: We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.

Conclusions: Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.

No MeSH data available.