Limits...
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH

Related in: MedlinePlus

Alignment filtering generally yields poorer phylogenetic trees. Depicted here are results with the enriched species tree discordance test on amino acid (top) and nucleotide (bottom) alignments from three taxonomic ranges. The measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. Colored lines are linear interpolations between additional points obtained with non-default parameters (not available for all methods). Error bars indicate the standard error of the mean. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Note that no multiple testing correction were applied.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538881&req=5

Figure 2: Alignment filtering generally yields poorer phylogenetic trees. Depicted here are results with the enriched species tree discordance test on amino acid (top) and nucleotide (bottom) alignments from three taxonomic ranges. The measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. Colored lines are linear interpolations between additional points obtained with non-default parameters (not available for all methods). Error bars indicate the standard error of the mean. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Note that no multiple testing correction were applied.

Mentions: Overall, we found that tree inference does not generally improve after alignment filtering (Fig. 2). With amino acid alignments, none of the filtering methods resulted in significant improvement (two-sided Wilcoxon test of paired samples, ); on the contrary, most of them decreased tree reconstruction accuracy, at times strongly so (Fig. 2, top). We had previously observed that amino acid alignments tend to be more accurate than nucleotide ones (Dessimoz and Gil 2010); one could thus expect filtering methods to have more opportunities to improve nucleotide alignments. In the present study, filtering fared slightly better on nucleotide alignments indeed, yet no combination showed significant improvements over unfiltered alignment (two-sided Wilcoxon test, ); instead, most cases were either insignificant or significantly worse than unfiltered alignments (Fig. 2, bottom).


Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Alignment filtering generally yields poorer phylogenetic trees. Depicted here are results with the enriched species tree discordance test on amino acid (top) and nucleotide (bottom) alignments from three taxonomic ranges. The measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. Colored lines are linear interpolations between additional points obtained with non-default parameters (not available for all methods). Error bars indicate the standard error of the mean. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Note that no multiple testing correction were applied.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538881&req=5

Figure 2: Alignment filtering generally yields poorer phylogenetic trees. Depicted here are results with the enriched species tree discordance test on amino acid (top) and nucleotide (bottom) alignments from three taxonomic ranges. The measure of error is the average RF distance between the reference trees and trees reconstructed from Prank + F alignments filtered by the various approaches. Trees were reconstructed using PhyML. Filtered alignments improving over unfiltered alignment fall in the gray region. The two dotted lines correspond to results obtained with two simplistic filtering methods (see main text). Points correspond to default parameters. Colored lines are linear interpolations between additional points obtained with non-default parameters (not available for all methods). Error bars indicate the standard error of the mean. If a filtering method with default parameters yields significantly different (two-sided Wilcoxon test, ) results from unfiltered alignments, a star is displayed below the corresponding point. Note that no multiple testing correction were applied.
Mentions: Overall, we found that tree inference does not generally improve after alignment filtering (Fig. 2). With amino acid alignments, none of the filtering methods resulted in significant improvement (two-sided Wilcoxon test of paired samples, ); on the contrary, most of them decreased tree reconstruction accuracy, at times strongly so (Fig. 2, top). We had previously observed that amino acid alignments tend to be more accurate than nucleotide ones (Dessimoz and Gil 2010); one could thus expect filtering methods to have more opportunities to improve nucleotide alignments. In the present study, filtering fared slightly better on nucleotide alignments indeed, yet no combination showed significant improvements over unfiltered alignment (two-sided Wilcoxon test, ); instead, most cases were either insignificant or significantly worse than unfiltered alignments (Fig. 2, bottom).

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH
Related in: MedlinePlus