Limits...
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH

Related in: MedlinePlus

Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538881&req=5

Figure 4: Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.

Mentions: As Fig. 4 shows, these separate analyses corroborated all the main findings above: that alignment filtering does not improve Ensembl trees over unfiltered alignments; that this remains true even after parameter optimization (for Noisy, ClustalW, BMGE, Guidance, and the simple baselines); that in most conditions, filtering methods did not significantly outperform simple baselines; and that trees tended to get worse as the strength of filtering increased.


Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538881&req=5

Figure 4: Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.
Mentions: As Fig. 4 shows, these separate analyses corroborated all the main findings above: that alignment filtering does not improve Ensembl trees over unfiltered alignments; that this remains true even after parameter optimization (for Noisy, ClustalW, BMGE, Guidance, and the simple baselines); that in most conditions, filtering methods did not significantly outperform simple baselines; and that trees tended to get worse as the strength of filtering increased.

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH
Related in: MedlinePlus