Limits...
AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa.

Stamatakis A, Auch AF, Meier-Kolthoff J, Göker M - BMC Bioinformatics (2007)

Bottom Line: The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times.Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5-61 times faster than Parafit with a lower memory footprint (up to 35% reduction) while the performance benefit increases with growing dataset size.We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ecole Polytechnique Fédérale de Lausanne, School of Computer & Communication Sciences, Laboratory for Computational Biology and Bioinformatics STATION 14, CH-1015 Lausanne, Switzerland. Alexandros.Stamatakis@epfl.ch

ABSTRACT

Background: Current tools for Co-phylogenetic analyses are not able to cope with the continuous accumulation of phylogenetic data. The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times. The Parafit and DistPCoA programs are the by far most compute-intensive components of the Parafit analysis pipeline. We present AxParafit and AxPcoords (Ax stands for Accelerated) which are highly optimized versions of Parafit and DistPCoA respectively.

Results: Both programs have been entirely re-written in C. Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5-61 times faster than Parafit with a lower memory footprint (up to 35% reduction) while the performance benefit increases with growing dataset size. The MPI-based parallel implementation of AxParafit shows good scalability on up to 128 processors, even on medium-sized datasets. The parallel analysis with AxParafit on 128 CPUs for a medium-sized dataset with an 512 by 512 association matrix is more than 1,200/128 times faster per processor than the sequential Parafit run. AxPcoords is 8-26 times faster than DistPCoA and numerically stable on large datasets. We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants. To the best of our knowledge, this study represents the largest co-phylogenetic analysis to date.

Conclusion: The highly efficient AxPcoords and AxParafit programs allow for large-scale co-phylogenetic analyses on several thousands of taxa for the first time. In addition, AxParafit and AxPcoords have been integrated into the easy-to-use CopyCat tool.

Show MeSH
Memory Consumption AxParafit versus Parafit. Memory consumption of Parafit and AxParafit for quadratic association matrices of size 128, 256, 512, 1,024, 2,048, and 4,096.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194794&req=5

Figure 3: Memory Consumption AxParafit versus Parafit. Memory consumption of Parafit and AxParafit for quadratic association matrices of size 128, 256, 512, 1,024, 2,048, and 4,096.

Mentions: Figure 3 provides the memory use of AxParafit and Parafit in MB for quadratic A-matrices of sizes 128, 256, 512, 1,024, 2,048, and 4,096 (note that the dataset of size 4,096 was not run to completion). To test AxPcoords we used distance matrices of sizes 512, 1,024, 2,048, and 4,096. Run-time improvements range from 8.8 to 25.74. The run on 4,096 with DistPCoA apparently terminated but did not write a results file, most probably due to numerical instability (Pierre Legendre, personal communication). Figure 4 shows the run-time improvement of AxPcoords over DistPCoA for quadratic distance matrices of sizes 512, 1,024, 2,048, and 4,096. As already mentioned, the run on 4,096 with DistPCoA did not write a results file. Tests on smaller distance matrices e.g., of size 128 and 256 were omitted due to the low execution times which were below 10 seconds. On the largest matrix AxPcoords terminated within only 399 seconds as opposed to 10,268 seconds required by DistPCoA.


AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa.

Stamatakis A, Auch AF, Meier-Kolthoff J, Göker M - BMC Bioinformatics (2007)

Memory Consumption AxParafit versus Parafit. Memory consumption of Parafit and AxParafit for quadratic association matrices of size 128, 256, 512, 1,024, 2,048, and 4,096.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194794&req=5

Figure 3: Memory Consumption AxParafit versus Parafit. Memory consumption of Parafit and AxParafit for quadratic association matrices of size 128, 256, 512, 1,024, 2,048, and 4,096.
Mentions: Figure 3 provides the memory use of AxParafit and Parafit in MB for quadratic A-matrices of sizes 128, 256, 512, 1,024, 2,048, and 4,096 (note that the dataset of size 4,096 was not run to completion). To test AxPcoords we used distance matrices of sizes 512, 1,024, 2,048, and 4,096. Run-time improvements range from 8.8 to 25.74. The run on 4,096 with DistPCoA apparently terminated but did not write a results file, most probably due to numerical instability (Pierre Legendre, personal communication). Figure 4 shows the run-time improvement of AxPcoords over DistPCoA for quadratic distance matrices of sizes 512, 1,024, 2,048, and 4,096. As already mentioned, the run on 4,096 with DistPCoA did not write a results file. Tests on smaller distance matrices e.g., of size 128 and 256 were omitted due to the low execution times which were below 10 seconds. On the largest matrix AxPcoords terminated within only 399 seconds as opposed to 10,268 seconds required by DistPCoA.

Bottom Line: The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times.Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5-61 times faster than Parafit with a lower memory footprint (up to 35% reduction) while the performance benefit increases with growing dataset size.We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ecole Polytechnique Fédérale de Lausanne, School of Computer & Communication Sciences, Laboratory for Computational Biology and Bioinformatics STATION 14, CH-1015 Lausanne, Switzerland. Alexandros.Stamatakis@epfl.ch

ABSTRACT

Background: Current tools for Co-phylogenetic analyses are not able to cope with the continuous accumulation of phylogenetic data. The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times. The Parafit and DistPCoA programs are the by far most compute-intensive components of the Parafit analysis pipeline. We present AxParafit and AxPcoords (Ax stands for Accelerated) which are highly optimized versions of Parafit and DistPCoA respectively.

Results: Both programs have been entirely re-written in C. Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5-61 times faster than Parafit with a lower memory footprint (up to 35% reduction) while the performance benefit increases with growing dataset size. The MPI-based parallel implementation of AxParafit shows good scalability on up to 128 processors, even on medium-sized datasets. The parallel analysis with AxParafit on 128 CPUs for a medium-sized dataset with an 512 by 512 association matrix is more than 1,200/128 times faster per processor than the sequential Parafit run. AxPcoords is 8-26 times faster than DistPCoA and numerically stable on large datasets. We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants. To the best of our knowledge, this study represents the largest co-phylogenetic analysis to date.

Conclusion: The highly efficient AxPcoords and AxParafit programs allow for large-scale co-phylogenetic analyses on several thousands of taxa for the first time. In addition, AxParafit and AxPcoords have been integrated into the easy-to-use CopyCat tool.

Show MeSH