Limits...
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Bottom Line: Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs.Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universit├Ątstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH

Related in: MedlinePlus

Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538881&req=5

Figure 4: Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.

Mentions: As Fig. 4 shows, these separate analyses corroborated all the main findings above: that alignment filtering does not improve Ensembl trees over unfiltered alignments; that this remains true even after parameter optimization (for Noisy, ClustalW, BMGE, Guidance, and the simple baselines); that in most conditions, filtering methods did not significantly outperform simple baselines; and that trees tended to get worse as the strength of filtering increased.


Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538881&req=5

Figure 4: Reanalysis on Ensembl Compara confirms main findings. Points correspond to filtering methods under default parameters. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Colored lines are linear interpolations between additional points obtained with non-default parameters and correspond to results obtained by varying the parameters of filtering methods (not available for TrimAl). If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results than unfiltered alignments, a star is displayed below the corresponding point.
Mentions: As Fig. 4 shows, these separate analyses corroborated all the main findings above: that alignment filtering does not improve Ensembl trees over unfiltered alignments; that this remains true even after parameter optimization (for Noisy, ClustalW, BMGE, Guidance, and the simple baselines); that in most conditions, filtering methods did not significantly outperform simple baselines; and that trees tended to get worse as the strength of filtering increased.

Bottom Line: Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs.Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universit├Ątstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH
Related in: MedlinePlus