Limits...
Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.

Yang K, Zhang L - Nucleic Acids Res. (2008)

Bottom Line: Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes-Cantor, Kimura, F84 and Tamura-Nei.These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences.Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators.

View Article: PubMed Central - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia, USA.

ABSTRACT
Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast 'guide tree' to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes-Cantor, Kimura, F84 and Tamura-Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.

Show MeSH
The accuracy of all five metrics on dataset 1 with the NJ method.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2275138&req=5

Figure 1: The accuracy of all five metrics on dataset 1 with the NJ method.

Mentions: Alignments for short sequences such as upstream regulatory regions can be extremely difficult due to the lack of knowledge on their mutation patterns. Low alignment quality can lead to erroneous inference of phylogenetic trees for these regions. Because using the k-tuple distance matrix to build phylogenetic trees bypasses the construction of an MSA, the k-tuple distance has great potential in addressing the need of tree construction for these regions. To evaluate the k-tuple distance performance in short sequences, we simulated 210 sets of sequences with the number of taxa in the sets ranging from 50 to 260 and sequence lengths from 30 to 120 bp. Figure 1 shows the result of the accuracy of trees reconstructed by the NJ method for the five distance estimators on dataset 1. It shows that the k-tuple distance outperformed other distance estimators by a considerable amount. The Tamura–Nei distance performed second with an average accuracy of 0.11246, less than half of that of the k-tuple distance 0.26110. The other three performed similarly, with average accuracy of 0.00126 for F84, 0.00043 for Jukes–Cantor and 0.00103 for Kimura.Figure 1.


Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.

Yang K, Zhang L - Nucleic Acids Res. (2008)

The accuracy of all five metrics on dataset 1 with the NJ method.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2275138&req=5

Figure 1: The accuracy of all five metrics on dataset 1 with the NJ method.
Mentions: Alignments for short sequences such as upstream regulatory regions can be extremely difficult due to the lack of knowledge on their mutation patterns. Low alignment quality can lead to erroneous inference of phylogenetic trees for these regions. Because using the k-tuple distance matrix to build phylogenetic trees bypasses the construction of an MSA, the k-tuple distance has great potential in addressing the need of tree construction for these regions. To evaluate the k-tuple distance performance in short sequences, we simulated 210 sets of sequences with the number of taxa in the sets ranging from 50 to 260 and sequence lengths from 30 to 120 bp. Figure 1 shows the result of the accuracy of trees reconstructed by the NJ method for the five distance estimators on dataset 1. It shows that the k-tuple distance outperformed other distance estimators by a considerable amount. The Tamura–Nei distance performed second with an average accuracy of 0.11246, less than half of that of the k-tuple distance 0.26110. The other three performed similarly, with average accuracy of 0.00126 for F84, 0.00043 for Jukes–Cantor and 0.00103 for Kimura.Figure 1.

Bottom Line: Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes-Cantor, Kimura, F84 and Tamura-Nei.These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences.Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators.

View Article: PubMed Central - PubMed

Affiliation: Virginia Bioinformatics Institute, Virginia, USA.

ABSTRACT
Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast 'guide tree' to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes-Cantor, Kimura, F84 and Tamura-Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.

Show MeSH