Limits...
Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability.

Wu G, Kao MY, Lin G, You JH - Algorithms Mol Biol (2008)

Bottom Line: When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4 log n) time, where p is the probability for a quartet topology being an error.This probability is improved by the third algorithm to approximately [equation; see text], where [equation, see text], with running time of O(n5), which is at least 0.984 when p < 0.05.The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada. wgang@cs.ualberta.ca

ABSTRACT

Background: In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability.

Results: The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4 log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately [equation; see text], where [equation, see text], with running time of O(n5), which is at least 0.984 when p < 0.05.

Conclusion: The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.

No MeSH data available.


Probability comparison among the proposed algorithm M-VOTE, the hypercleaning algorithm (HC), the answer set programming method for the MQC problem (ASP), and the theoretical success probability of M-VOTE from Theorem 4.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2267719&req=5

Figure 2: Probability comparison among the proposed algorithm M-VOTE, the hypercleaning algorithm (HC), the answer set programming method for the MQC problem (ASP), and the theoretical success probability of M-VOTE from Theorem 4.

Mentions: Given a quartet error probability p, the expected number of quartet errors in Q is p/Q/. It follows from Lemma 5 that if , then there is a high probability for the existence of a compatible m-subset. For instance, when p < 0.05, algorithm M-VOTE almost always find a compatible 5-subset (and the probability that the associated phylogeny is correct is at least 0.984; see Figure 2).


Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability.

Wu G, Kao MY, Lin G, You JH - Algorithms Mol Biol (2008)

Probability comparison among the proposed algorithm M-VOTE, the hypercleaning algorithm (HC), the answer set programming method for the MQC problem (ASP), and the theoretical success probability of M-VOTE from Theorem 4.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2267719&req=5

Figure 2: Probability comparison among the proposed algorithm M-VOTE, the hypercleaning algorithm (HC), the answer set programming method for the MQC problem (ASP), and the theoretical success probability of M-VOTE from Theorem 4.
Mentions: Given a quartet error probability p, the expected number of quartet errors in Q is p/Q/. It follows from Lemma 5 that if , then there is a high probability for the existence of a compatible m-subset. For instance, when p < 0.05, algorithm M-VOTE almost always find a compatible 5-subset (and the probability that the associated phylogeny is correct is at least 0.984; see Figure 2).

Bottom Line: When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4 log n) time, where p is the probability for a quartet topology being an error.This probability is improved by the third algorithm to approximately [equation; see text], where [equation, see text], with running time of O(n5), which is at least 0.984 when p < 0.05.The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada. wgang@cs.ualberta.ca

ABSTRACT

Background: In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability.

Results: The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in O(n2) time by querying at most (n - 4) log(n - 1) quartet topologies, where n is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - p in O(n4 log n) time, where p is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately [equation; see text], where [equation, see text], with running time of O(n5), which is at least 0.984 when p < 0.05.

Conclusion: The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.

No MeSH data available.