Limits...
Fastphylo: fast tools for phylogenetics.

Khan MA, Elias I, Sjölund E, Nylander K, Guimera RV, Schobesberger R, Schmitzberger P, Lagergren J, Arvestad L - BMC Bioinformatics (2013)

Bottom Line: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix.Fastphylo is a fast, memory efficient, and easy to use software suite.Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Department of Computational Biology, Solna, Sweden. malagori@kth.se.

ABSTRACT

Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances.

Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency.

Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Show MeSH
Time and memory comparison of the distance matrix computation. The analysis in Figure7a and7b were performed on dataset-1, while Figure7c shows the memory utilization of RapidNJ and fastdist (using the binary format) on dataset-2. ClearCut was not considered in this experiment since it outputs the distance matrix as an option, and at the same time it outputs the phylogenetic tree.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4225504&req=5

Figure 7: Time and memory comparison of the distance matrix computation. The analysis in Figure7a and7b were performed on dataset-1, while Figure7c shows the memory utilization of RapidNJ and fastdist (using the binary format) on dataset-2. ClearCut was not considered in this experiment since it outputs the distance matrix as an option, and at the same time it outputs the phylogenetic tree.

Mentions: To further investigate the delay in fastdist-fnj pipe, we split the experiment into two phases: 1) compute the distance matrix separately; and 2) compute the phylogenetic tree using the distance matrix as an input to the neighbour joining tools considered in this study. The results of these investigations are formulated in Figures7 and8, respectively. Figure7 shows the time and memory comparison of NJ tools for computing the distance matrices. It is evident that RapidNJ outperforms all the other tools. It is ∼2 times faster than fastdist (see Figure7a). However, RapidNJ’s memory consumption increases quadratically with the number of sequences, while fastdist’s memory utilization increases linearly with the number of sequences (see Figure7c). In Figure7c, we report the results of RapidNJ upto 85,000 taxa. This is due to the memory limitation for computing the distance matrices for this experiment, i.e. 24 GB RAM. RapidNJ computed distance matrices for 17 gene families of size ranging from 5,000 to 85,000 sequences, while fastdist computed distance matrices for all the 20 gene families of size ranging from 5,000 to 100,000 sequences within the allocated memory. We can attribute the delay in the fastdist-fnj pipe, when compared to RapidNJ, in Figures3 and5 to the slow computation of distance matrices by fastdist program.


Fastphylo: fast tools for phylogenetics.

Khan MA, Elias I, Sjölund E, Nylander K, Guimera RV, Schobesberger R, Schmitzberger P, Lagergren J, Arvestad L - BMC Bioinformatics (2013)

Time and memory comparison of the distance matrix computation. The analysis in Figure7a and7b were performed on dataset-1, while Figure7c shows the memory utilization of RapidNJ and fastdist (using the binary format) on dataset-2. ClearCut was not considered in this experiment since it outputs the distance matrix as an option, and at the same time it outputs the phylogenetic tree.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4225504&req=5

Figure 7: Time and memory comparison of the distance matrix computation. The analysis in Figure7a and7b were performed on dataset-1, while Figure7c shows the memory utilization of RapidNJ and fastdist (using the binary format) on dataset-2. ClearCut was not considered in this experiment since it outputs the distance matrix as an option, and at the same time it outputs the phylogenetic tree.
Mentions: To further investigate the delay in fastdist-fnj pipe, we split the experiment into two phases: 1) compute the distance matrix separately; and 2) compute the phylogenetic tree using the distance matrix as an input to the neighbour joining tools considered in this study. The results of these investigations are formulated in Figures7 and8, respectively. Figure7 shows the time and memory comparison of NJ tools for computing the distance matrices. It is evident that RapidNJ outperforms all the other tools. It is ∼2 times faster than fastdist (see Figure7a). However, RapidNJ’s memory consumption increases quadratically with the number of sequences, while fastdist’s memory utilization increases linearly with the number of sequences (see Figure7c). In Figure7c, we report the results of RapidNJ upto 85,000 taxa. This is due to the memory limitation for computing the distance matrices for this experiment, i.e. 24 GB RAM. RapidNJ computed distance matrices for 17 gene families of size ranging from 5,000 to 85,000 sequences, while fastdist computed distance matrices for all the 20 gene families of size ranging from 5,000 to 100,000 sequences within the allocated memory. We can attribute the delay in the fastdist-fnj pipe, when compared to RapidNJ, in Figures3 and5 to the slow computation of distance matrices by fastdist program.

Bottom Line: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix.Fastphylo is a fast, memory efficient, and easy to use software suite.Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Department of Computational Biology, Solna, Sweden. malagori@kth.se.

ABSTRACT

Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances.

Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency.

Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Show MeSH