Limits...
SUNPLIN: simulation with uncertainty for phylogenetic investigations.

Martins WS, Carmo WC, Longo HJ, Rosa TC, Rangel TF - BMC Bioinformatics (2013)

Bottom Line: The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis.The code may be used as a standalone program or as a shared object in the R system.Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Informatics, Federal University of Goiás, Goiânia, Brazil. wellington@inf.ufg.br.

ABSTRACT

Background: Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.

Results: In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.

Conclusion: We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

Show MeSH
Heavy chain decomposition. The previously expanded tree after the heavy chain decomposition. The chains produced are: [1-5-7-8-17-9], [2-23-3], [11-19-20], [21-4], [15-10], [24], [22], [6], [18], [16], [12], [13].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4225676&req=5

Figure 3: Heavy chain decomposition. The previously expanded tree after the heavy chain decomposition. The chains produced are: [1-5-7-8-17-9], [2-23-3], [11-19-20], [21-4], [15-10], [24], [22], [6], [18], [16], [12], [13].

Mentions: For example, consider the tree in Figure 3. The value between parentheses besides the number of a node represents the number of descendants of that node (i.e. the value of descendants[] for that node). The cost of the chain from node 1 to node 9 is given by the following sum: 22+14+12+6+2+0 which, equals to 56. This chain is the heavy chain for node 1 since there is no other chain from node 1 to another leaf node which, has a higher cost.


SUNPLIN: simulation with uncertainty for phylogenetic investigations.

Martins WS, Carmo WC, Longo HJ, Rosa TC, Rangel TF - BMC Bioinformatics (2013)

Heavy chain decomposition. The previously expanded tree after the heavy chain decomposition. The chains produced are: [1-5-7-8-17-9], [2-23-3], [11-19-20], [21-4], [15-10], [24], [22], [6], [18], [16], [12], [13].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4225676&req=5

Figure 3: Heavy chain decomposition. The previously expanded tree after the heavy chain decomposition. The chains produced are: [1-5-7-8-17-9], [2-23-3], [11-19-20], [21-4], [15-10], [24], [22], [6], [18], [16], [12], [13].
Mentions: For example, consider the tree in Figure 3. The value between parentheses besides the number of a node represents the number of descendants of that node (i.e. the value of descendants[] for that node). The cost of the chain from node 1 to node 9 is given by the following sum: 22+14+12+6+2+0 which, equals to 56. This chain is the heavy chain for node 1 since there is no other chain from node 1 to another leaf node which, has a higher cost.

Bottom Line: The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis.The code may be used as a standalone program or as a shared object in the R system.Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Informatics, Federal University of Goiás, Goiânia, Brazil. wellington@inf.ufg.br.

ABSTRACT

Background: Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.

Results: In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.

Conclusion: We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

Show MeSH