Limits...
Fastphylo: fast tools for phylogenetics.

Khan MA, Elias I, Sjölund E, Nylander K, Guimera RV, Schobesberger R, Schmitzberger P, Lagergren J, Arvestad L - BMC Bioinformatics (2013)

Bottom Line: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix.Fastphylo is a fast, memory efficient, and easy to use software suite.Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Department of Computational Biology, Solna, Sweden. malagori@kth.se.

ABSTRACT

Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances.

Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency.

Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Show MeSH
Memory consumption of fastdist program. This figure shows fastdist computation on 10 gene families with family size ranging from 1,000 to 10,000. Here, Fastdist-without-Ambiguity refers to the results computed using binary format functionality (discussed in section 'Features of fastdist’), while Fastdist-with-Ambiguity refers to the fastdist computation using ambiguity information. The results in the figure suggest that the Fastdist-with-Ambiguity computation requires much more memory than Fastdist-without-Ambiguity as the gene family size increases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4225504&req=5

Figure 1: Memory consumption of fastdist program. This figure shows fastdist computation on 10 gene families with family size ranging from 1,000 to 10,000. Here, Fastdist-without-Ambiguity refers to the results computed using binary format functionality (discussed in section 'Features of fastdist’), while Fastdist-with-Ambiguity refers to the fastdist computation using ambiguity information. The results in the figure suggest that the Fastdist-with-Ambiguity computation requires much more memory than Fastdist-without-Ambiguity as the gene family size increases.

Mentions: The two distinguishing features of fastdist, however, are speed and the support for ambiguity symbols (see further[15]). fastdist computes the whole distance matrix using ambiguity symbols in a default mode, which requires quadratic memory space as the gene family size increases (see Figure1). To overcome this problem, we introduce a binary format that performs row-wise operations in computing the upper triangular distance matrix. Furthermore, instead of keeping the whole distance matrix in plain text, we store the upper triangular matrix in a binary format that reduces the amount of disk space substantially. For instance, the distance matrix computed by fastdist using the binary format for 100,000 sequences, with each sequence of length 2000 bp, took ∼19 GB of disk space while the distance matrix for the same set of sequences computed by RapidNJ[12] using PHYLIP format consumed ∼76 GB of disk space. Using the memory-efficient option, fastdist allows the users to do row-wise operations while computing the distance matrix, i.e., keeping only a single row of the distance matrix in memory. When the binary format option is used, memory-efficient functionality is implicitly invoked. Both memory-efficient and binary format, however, do not support ambiguity symbols information for computing distance matrix.


Fastphylo: fast tools for phylogenetics.

Khan MA, Elias I, Sjölund E, Nylander K, Guimera RV, Schobesberger R, Schmitzberger P, Lagergren J, Arvestad L - BMC Bioinformatics (2013)

Memory consumption of fastdist program. This figure shows fastdist computation on 10 gene families with family size ranging from 1,000 to 10,000. Here, Fastdist-without-Ambiguity refers to the results computed using binary format functionality (discussed in section 'Features of fastdist’), while Fastdist-with-Ambiguity refers to the fastdist computation using ambiguity information. The results in the figure suggest that the Fastdist-with-Ambiguity computation requires much more memory than Fastdist-without-Ambiguity as the gene family size increases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4225504&req=5

Figure 1: Memory consumption of fastdist program. This figure shows fastdist computation on 10 gene families with family size ranging from 1,000 to 10,000. Here, Fastdist-without-Ambiguity refers to the results computed using binary format functionality (discussed in section 'Features of fastdist’), while Fastdist-with-Ambiguity refers to the fastdist computation using ambiguity information. The results in the figure suggest that the Fastdist-with-Ambiguity computation requires much more memory than Fastdist-without-Ambiguity as the gene family size increases.
Mentions: The two distinguishing features of fastdist, however, are speed and the support for ambiguity symbols (see further[15]). fastdist computes the whole distance matrix using ambiguity symbols in a default mode, which requires quadratic memory space as the gene family size increases (see Figure1). To overcome this problem, we introduce a binary format that performs row-wise operations in computing the upper triangular distance matrix. Furthermore, instead of keeping the whole distance matrix in plain text, we store the upper triangular matrix in a binary format that reduces the amount of disk space substantially. For instance, the distance matrix computed by fastdist using the binary format for 100,000 sequences, with each sequence of length 2000 bp, took ∼19 GB of disk space while the distance matrix for the same set of sequences computed by RapidNJ[12] using PHYLIP format consumed ∼76 GB of disk space. Using the memory-efficient option, fastdist allows the users to do row-wise operations while computing the distance matrix, i.e., keeping only a single row of the distance matrix in memory. When the binary format option is used, memory-efficient functionality is implicitly invoked. Both memory-efficient and binary format, however, do not support ambiguity symbols information for computing distance matrix.

Bottom Line: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix.Fastphylo is a fast, memory efficient, and easy to use software suite.Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Department of Computational Biology, Solna, Sweden. malagori@kth.se.

ABSTRACT

Background: Distance methods are ubiquitous tools in phylogenetics. Their primary purpose may be to reconstruct evolutionary history, but they are also used as components in bioinformatic pipelines. However, poor computational efficiency has been a constraint on the applicability of distance methods on very large problem instances.

Results: We present fastphylo, a software package containing implementations of efficient algorithms for two common problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing a phylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methods and report the results in terms of speed and memory efficiency.

Conclusions: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture, fastphylo is a flexible tool for many phylogenetic studies.

Show MeSH