Limits...
A sub-cubic time algorithm for computing the quartet distance between two general trees.

Nielsen J, Kristensen AK, Mailund T, Pedersen CN - Algorithms Mol Biol (2011)

Bottom Line: When inferring phylogenetic trees different algorithms may give different trees.To study such effects a measure for the distance between two trees is useful.Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Centre (BiRC), Aarhus University, C, F, Møllers Alle 8, DK-8000 Aarhus C, Denmark. jn@birc.au.dk.

ABSTRACT

Background: When inferring phylogenetic trees different algorithms may give different trees. To study such effects a measure for the distance between two trees is useful. Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees.

Results: We have derived a new algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is sub-cubic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm so far for computing the quartet distance between general trees independent of the degree of the inner nodes.

Conclusions: We have implemented our algorithm and two of the best competitors. Our new algorithm is significantly faster than the competition and seems to run in close to quadratic time in practice.

No MeSH data available.


Counting directed claims. A shared butterfly induces two butterflies in each tree, which will give four pairs of claims, however the butterflies will only be identical in two of these pairs, thus a shared butterfly will be counted twice. A different butterfly also induces four pairs of claims, but since we are counting different butterflies all four will be counted. The way we count shared butterflies prevents the two different butterflies induced by the shared (undirected) butterfly from being counted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3141660&req=5

Figure 4: Counting directed claims. A shared butterfly induces two butterflies in each tree, which will give four pairs of claims, however the butterflies will only be identical in two of these pairs, thus a shared butterfly will be counted twice. A different butterfly also induces four pairs of claims, but since we are counting different butterflies all four will be counted. The way we count shared butterflies prevents the two different butterflies induced by the shared (undirected) butterfly from being counted.

Mentions: The crux of the algorithm is to consider each pair of claims, one from each tree, and for each such pair count the number of shared and different directed butterflies claimed in the two trees. This way each shared butterfly is counted twice, and each different butterfly is counted four times, as shown in Figure 4. Dividing the counts by two and four, respectively, gives us sharedB(T, T') and diffB(T, T').


A sub-cubic time algorithm for computing the quartet distance between two general trees.

Nielsen J, Kristensen AK, Mailund T, Pedersen CN - Algorithms Mol Biol (2011)

Counting directed claims. A shared butterfly induces two butterflies in each tree, which will give four pairs of claims, however the butterflies will only be identical in two of these pairs, thus a shared butterfly will be counted twice. A different butterfly also induces four pairs of claims, but since we are counting different butterflies all four will be counted. The way we count shared butterflies prevents the two different butterflies induced by the shared (undirected) butterfly from being counted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3141660&req=5

Figure 4: Counting directed claims. A shared butterfly induces two butterflies in each tree, which will give four pairs of claims, however the butterflies will only be identical in two of these pairs, thus a shared butterfly will be counted twice. A different butterfly also induces four pairs of claims, but since we are counting different butterflies all four will be counted. The way we count shared butterflies prevents the two different butterflies induced by the shared (undirected) butterfly from being counted.
Mentions: The crux of the algorithm is to consider each pair of claims, one from each tree, and for each such pair count the number of shared and different directed butterflies claimed in the two trees. This way each shared butterfly is counted twice, and each different butterfly is counted four times, as shown in Figure 4. Dividing the counts by two and four, respectively, gives us sharedB(T, T') and diffB(T, T').

Bottom Line: When inferring phylogenetic trees different algorithms may give different trees.To study such effects a measure for the distance between two trees is useful.Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Centre (BiRC), Aarhus University, C, F, Møllers Alle 8, DK-8000 Aarhus C, Denmark. jn@birc.au.dk.

ABSTRACT

Background: When inferring phylogenetic trees different algorithms may give different trees. To study such effects a measure for the distance between two trees is useful. Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees.

Results: We have derived a new algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is sub-cubic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm so far for computing the quartet distance between general trees independent of the degree of the inner nodes.

Conclusions: We have implemented our algorithm and two of the best competitors. Our new algorithm is significantly faster than the competition and seems to run in close to quadratic time in practice.

No MeSH data available.