Limits...
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH

Related in: MedlinePlus

Filtering not only increases the fraction of branches that are unresolved, but also often increases the fraction of resolved branches that are incorrect. Using approximate Bayesian posterior as the branch support measure (Anisimova et al. 2011), we considered branches below particular branch support values as unresolved (cutoff values in italics) in the enriched species discordance test on amino acid sequences.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538881&req=5

Figure 3: Filtering not only increases the fraction of branches that are unresolved, but also often increases the fraction of resolved branches that are incorrect. Using approximate Bayesian posterior as the branch support measure (Anisimova et al. 2011), we considered branches below particular branch support values as unresolved (cutoff values in italics) in the enriched species discordance test on amino acid sequences.

Mentions: We decomposed the effect of the various filtering methods on the false positive and false negative rates, using the enriched species discordance test and taking the approximate Bayesian posterior (Anisimova et al. 2011) as the measure of support for each branch. Figure 3 shows the results of this analysis applied to amino acid sequences. As expected, filtering consistently led to an increase in the false negative rate for all conditions. This is consistent with our other observations, which indicate that many phylogenetically informative sites are lost to alignment filtering. Worryingly, the false positive rate also increased in many combinations, particularly when we used lower minimum support thresholds (0.75 and 0.9) in the Fungi and Eukaryote data sets. With more stringent thresholds (0.95 and 0.99), the impact of filtering on the false positive rate was less pronounced but in many cases still detrimental. Only in the Bacteria data set did filtering lead to a decrease in the false positive rate. These observations also broadly hold for the nucleotide alignments (Supplementary Fig. 11 available on Dryad at http://dx.doi.org/10.5061/dryad.pc5j0).


Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M, Dessimoz C - Syst. Biol. (2015)

Filtering not only increases the fraction of branches that are unresolved, but also often increases the fraction of resolved branches that are incorrect. Using approximate Bayesian posterior as the branch support measure (Anisimova et al. 2011), we considered branches below particular branch support values as unresolved (cutoff values in italics) in the enriched species discordance test on amino acid sequences.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538881&req=5

Figure 3: Filtering not only increases the fraction of branches that are unresolved, but also often increases the fraction of resolved branches that are incorrect. Using approximate Bayesian posterior as the branch support measure (Anisimova et al. 2011), we considered branches below particular branch support values as unresolved (cutoff values in italics) in the enriched species discordance test on amino acid sequences.
Mentions: We decomposed the effect of the various filtering methods on the false positive and false negative rates, using the enriched species discordance test and taking the approximate Bayesian posterior (Anisimova et al. 2011) as the measure of support for each branch. Figure 3 shows the results of this analysis applied to amino acid sequences. As expected, filtering consistently led to an increase in the false negative rate for all conditions. This is consistent with our other observations, which indicate that many phylogenetically informative sites are lost to alignment filtering. Worryingly, the false positive rate also increased in many combinations, particularly when we used lower minimum support thresholds (0.75 and 0.9) in the Fungi and Eukaryote data sets. With more stringent thresholds (0.95 and 0.99), the impact of filtering on the false positive rate was less pronounced but in many cases still detrimental. Only in the Bacteria data set did filtering lead to a decrease in the false positive rate. These observations also broadly hold for the nucleotide alignments (Supplementary Fig. 11 available on Dryad at http://dx.doi.org/10.5061/dryad.pc5j0).

Bottom Line: We confirm that our findings hold for a wide range of parameters and methods.Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference.By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, ETH Zurich, Universitätstr. 6, 8092 Zurich, Switzerland, Department of Molecular Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK; MRC Clinical Sciences Centre, London W12 0NN, UK;

Show MeSH
Related in: MedlinePlus