Limits...
SUNPLIN: simulation with uncertainty for phylogenetic investigations.

Martins WS, Carmo WC, Longo HJ, Rosa TC, Rangel TF - BMC Bioinformatics (2013)

Bottom Line: The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis.The code may be used as a standalone program or as a shared object in the R system.Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Informatics, Federal University of Goiás, Goiânia, Brazil. wellington@inf.ufg.br.

ABSTRACT

Background: Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.

Results: In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.

Conclusion: We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

Show MeSH
(Left) Phylogenetic tree and (top right) the species to be inserted. Input data representation. Phylogenetic tree and the species to be inserted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4225676&req=5

Figure 1: (Left) Phylogenetic tree and (top right) the species to be inserted. Input data representation. Phylogenetic tree and the species to be inserted.

Mentions: Figure 1 illustrates an example tree with numbered nodes. The root of the tree is node 1. Nodes 3, 4, 6, 9, 10, 12 and 13 are leaf nodes. For the sake of simplification, the tree is made binary and all branch lengths are assigned the value 1.


SUNPLIN: simulation with uncertainty for phylogenetic investigations.

Martins WS, Carmo WC, Longo HJ, Rosa TC, Rangel TF - BMC Bioinformatics (2013)

(Left) Phylogenetic tree and (top right) the species to be inserted. Input data representation. Phylogenetic tree and the species to be inserted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4225676&req=5

Figure 1: (Left) Phylogenetic tree and (top right) the species to be inserted. Input data representation. Phylogenetic tree and the species to be inserted.
Mentions: Figure 1 illustrates an example tree with numbered nodes. The root of the tree is node 1. Nodes 3, 4, 6, 9, 10, 12 and 13 are leaf nodes. For the sake of simplification, the tree is made binary and all branch lengths are assigned the value 1.

Bottom Line: The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis.The code may be used as a standalone program or as a shared object in the R system.Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Informatics, Federal University of Goiás, Goiânia, Brazil. wellington@inf.ufg.br.

ABSTRACT

Background: Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.

Results: In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.

Conclusion: We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

Show MeSH