Limits...
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH

Related in: MedlinePlus

Effect of alignment filtering on simulated data (500 alignments with 30 sequences each): induced tree and alignment accuracy. Tree accuracy (left): the measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the grey region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Error bars indicate the standard error of the mean. Alignment accuracy (right): precision and recall for the various filtering methods, using sum-of-pair scoring function (see section “Methods”).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538881&req=5

Figure 5: Effect of alignment filtering on simulated data (500 alignments with 30 sequences each): induced tree and alignment accuracy. Tree accuracy (left): the measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the grey region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Error bars indicate the standard error of the mean. Alignment accuracy (right): precision and recall for the various filtering methods, using sum-of-pair scoring function (see section “Methods”).

Mentions: Results on simulated data were consistent with our findings on empirical data. Filtering did not lead to better trees on average (Fig. 5, left). Likewise, though parameter optimization improved the performance of the methods, filtering remained generally counterproductive. Also, filtering methods performed broadly in line with simple baseline methods, with more filtering yielding poorer results.


Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Effect of alignment filtering on simulated data (500 alignments with 30 sequences each): induced tree and alignment accuracy. Tree accuracy (left): the measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the grey region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Error bars indicate the standard error of the mean. Alignment accuracy (right): precision and recall for the various filtering methods, using sum-of-pair scoring function (see section “Methods”).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538881&req=5

Figure 5: Effect of alignment filtering on simulated data (500 alignments with 30 sequences each): induced tree and alignment accuracy. Tree accuracy (left): the measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the grey region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Error bars indicate the standard error of the mean. Alignment accuracy (right): precision and recall for the various filtering methods, using sum-of-pair scoring function (see section “Methods”).
Mentions: Results on simulated data were consistent with our findings on empirical data. Filtering did not lead to better trees on average (Fig. 5, left). Likewise, though parameter optimization improved the performance of the methods, filtering remained generally counterproductive. Also, filtering methods performed broadly in line with simple baseline methods, with more filtering yielding poorer results.

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH
Related in: MedlinePlus