Limits...
Taxon ordering in phylogenetic trees by means of evolutionary algorithms.

Cerutti F, Bertolotti L, Goldberg TL, Giacobini M - BioData Min (2011)

Bottom Line: The (1 + 1)-EA consistently outperformed a random search, and better results were obtained setting the radius to 8.The (λ + μ)-EAs performed as well as the (1 + 1), except the larger population (1000 + 1000).Biological relationships between samples are also easier to observe.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Animal Production, Epidemiology and Ecology, Faculty of Veterinary Medicine, University of Torino, Via Leonardo da Vinci 44, 10095, Grugliasco (TO), Italy. mario.giacobini@unito.it.

ABSTRACT

Background: In in a typical "left-to-right" phylogenetic tree, the vertical order of taxa is meaningless, as only the branch path between them reflects their degree of similarity. To make unresolved trees more informative, here we propose an innovative Evolutionary Algorithm (EA) method to search the best graphical representation of unresolved trees, in order to give a biological meaning to the vertical order of taxa.

Methods: Starting from a West Nile virus phylogenetic tree, in a (1 + 1)-EA we evolved it by randomly rotating the internal nodes and selecting the tree with better fitness every generation. The fitness is a sum of genetic distances between the considered taxon and the r (radius) next taxa. After having set the radius to the best performance, we evolved the trees with (λ + μ)-EAs to study the influence of population on the algorithm.

Results: The (1 + 1)-EA consistently outperformed a random search, and better results were obtained setting the radius to 8. The (λ + μ)-EAs performed as well as the (1 + 1), except the larger population (1000 + 1000).

Conclusions: The trees after the evolution showed an improvement both of the fitness (based on a genetic distance matrix, then close taxa are actually genetically close), and of the biological interpretation. Samples collected in the same state or year moved close each other, making the tree easier to interpret. Biological relationships between samples are also easier to observe.

No MeSH data available.


Related in: MedlinePlus

Relative fitness improvement by (λ + μ)-EAs. Boxplot representing the relative final fitness values for the (λ + μ)-EAs (a) and magnified in the range [0.745, 0.76] (b). The dotted line in a) is the threshold used to evaluate the computational effort and set at 0.78.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3142222&req=5

Figure 3: Relative fitness improvement by (λ + μ)-EAs. Boxplot representing the relative final fitness values for the (λ + μ)-EAs (a) and magnified in the range [0.745, 0.76] (b). The dotted line in a) is the threshold used to evaluate the computational effort and set at 0.78.

Mentions: The best result, or in other words the best fitness improvement, was obtained by the (1 + 5)-EA, giving a relative improvement of 0.75. Nevertheless the Wilcoxon Rank Sum test performed between EAs returned not significant differences except in the case of the (1000 + 1000)-EA, that returned p - value ≪ 0.01. Boxplots in Figure 3 highlight how (1000 + 1000)-EA has a lower performance in finding better solutions compared to the other populations (Figure 3b), although the distribution of its final fitness values is the only one in which all runs cross the threshold of relative improvement of 0.78 (Figure 3a). Hit rates reported in Table 2 also underline that all (1000 + 1000)-EAs cross the threshold. The other (λ + μ) combinations instead succeeded only in between 90% and 94% of the runs. This threshold was set to compare the convergence speed of the different populations and its value was arbitrarily chosen in the range [0.757, 0.780], as a change in this value does not affect the conclusion. For convergence speed we mean the number of evaluations needed to cross this threshold, and these values are reported in Table 2. The convergence speed of the (1000 + 1000)-EA is statistically lower than the other EAs (p-values ≪ 0.01 in Wilcoxon Rank Sum tests), resulting in the best computational performances but in the worse solution by one order of magnitude (Figure 3). Additionally the (50 + 50)-EA convergence speed is statistically better than (1 + 1), (5 + 10) and (10 + 10)-EAs.


Taxon ordering in phylogenetic trees by means of evolutionary algorithms.

Cerutti F, Bertolotti L, Goldberg TL, Giacobini M - BioData Min (2011)

Relative fitness improvement by (λ + μ)-EAs. Boxplot representing the relative final fitness values for the (λ + μ)-EAs (a) and magnified in the range [0.745, 0.76] (b). The dotted line in a) is the threshold used to evaluate the computational effort and set at 0.78.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3142222&req=5

Figure 3: Relative fitness improvement by (λ + μ)-EAs. Boxplot representing the relative final fitness values for the (λ + μ)-EAs (a) and magnified in the range [0.745, 0.76] (b). The dotted line in a) is the threshold used to evaluate the computational effort and set at 0.78.
Mentions: The best result, or in other words the best fitness improvement, was obtained by the (1 + 5)-EA, giving a relative improvement of 0.75. Nevertheless the Wilcoxon Rank Sum test performed between EAs returned not significant differences except in the case of the (1000 + 1000)-EA, that returned p - value ≪ 0.01. Boxplots in Figure 3 highlight how (1000 + 1000)-EA has a lower performance in finding better solutions compared to the other populations (Figure 3b), although the distribution of its final fitness values is the only one in which all runs cross the threshold of relative improvement of 0.78 (Figure 3a). Hit rates reported in Table 2 also underline that all (1000 + 1000)-EAs cross the threshold. The other (λ + μ) combinations instead succeeded only in between 90% and 94% of the runs. This threshold was set to compare the convergence speed of the different populations and its value was arbitrarily chosen in the range [0.757, 0.780], as a change in this value does not affect the conclusion. For convergence speed we mean the number of evaluations needed to cross this threshold, and these values are reported in Table 2. The convergence speed of the (1000 + 1000)-EA is statistically lower than the other EAs (p-values ≪ 0.01 in Wilcoxon Rank Sum tests), resulting in the best computational performances but in the worse solution by one order of magnitude (Figure 3). Additionally the (50 + 50)-EA convergence speed is statistically better than (1 + 1), (5 + 10) and (10 + 10)-EAs.

Bottom Line: The (1 + 1)-EA consistently outperformed a random search, and better results were obtained setting the radius to 8.The (λ + μ)-EAs performed as well as the (1 + 1), except the larger population (1000 + 1000).Biological relationships between samples are also easier to observe.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Animal Production, Epidemiology and Ecology, Faculty of Veterinary Medicine, University of Torino, Via Leonardo da Vinci 44, 10095, Grugliasco (TO), Italy. mario.giacobini@unito.it.

ABSTRACT

Background: In in a typical "left-to-right" phylogenetic tree, the vertical order of taxa is meaningless, as only the branch path between them reflects their degree of similarity. To make unresolved trees more informative, here we propose an innovative Evolutionary Algorithm (EA) method to search the best graphical representation of unresolved trees, in order to give a biological meaning to the vertical order of taxa.

Methods: Starting from a West Nile virus phylogenetic tree, in a (1 + 1)-EA we evolved it by randomly rotating the internal nodes and selecting the tree with better fitness every generation. The fitness is a sum of genetic distances between the considered taxon and the r (radius) next taxa. After having set the radius to the best performance, we evolved the trees with (λ + μ)-EAs to study the influence of population on the algorithm.

Results: The (1 + 1)-EA consistently outperformed a random search, and better results were obtained setting the radius to 8. The (λ + μ)-EAs performed as well as the (1 + 1), except the larger population (1000 + 1000).

Conclusions: The trees after the evolution showed an improvement both of the fitness (based on a genetic distance matrix, then close taxa are actually genetically close), and of the biological interpretation. Samples collected in the same state or year moved close each other, making the tree easier to interpret. Biological relationships between samples are also easier to observe.

No MeSH data available.


Related in: MedlinePlus