Limits...
Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees.

Keller A, Förster F, Müller T, Dandekar T, Schultz J, Wolf M - Biol. Direct (2010)

Bottom Line: An extensive evaluation of the benefits of secondary structure, however, is lacking.The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness.Koonin.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics, University of Würzburg, Am Hubland, 97074 Würzburg, Germany.

ABSTRACT

Background: In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking.

Results: This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness.

Conclusions: Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion.

Reviewers: This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin.

Open peer review: Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.

Show MeSH
Quartet distances values for equidistant trees. All five ancestral sequences were combined for a given scenario. (a) Boxplot and solid splines are for 14 taxa scenarios of the three methods. Dashed lines and dotted lines are splines of ten and 18 taxa, respectively. (b) Direct comparison of the 14 taxa splines and medians of all three methods. The samples size of each scenario is 1,000. The accuracy of tree topologies decreases with more taxa and greater evolutionary distances between sequences. Trees calculated with secondary structures or doubled sequences show greater accuracy than those determined with normal sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2821295&req=5

Figure 2: Quartet distances values for equidistant trees. All five ancestral sequences were combined for a given scenario. (a) Boxplot and solid splines are for 14 taxa scenarios of the three methods. Dashed lines and dotted lines are splines of ten and 18 taxa, respectively. (b) Direct comparison of the 14 taxa splines and medians of all three methods. The samples size of each scenario is 1,000. The accuracy of tree topologies decreases with more taxa and greater evolutionary distances between sequences. Trees calculated with secondary structures or doubled sequences show greater accuracy than those determined with normal sequences.

Mentions: The shapes of bootstrap, Quartet distance and Robinson-Foulds distance distributions were similar for equidistant and variable distance trees. However, the branches of the trees for each underlying data set (sequence, sequence-structure and doubled sequence) received higher bootstrap support values and fewer false splits with constant branch lengths compared to variable distances, though differences were minimal (Figs. 1, 2, 3 and 4). Only Quartet distances are shown, since they are congruent with the results of the Robinson-Foulds distance (Additional file 1). Additionally, we included a relative per-branch representation of accuracy divided by the number of internal nodes in the Additional file 1.


Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees.

Keller A, Förster F, Müller T, Dandekar T, Schultz J, Wolf M - Biol. Direct (2010)

Quartet distances values for equidistant trees. All five ancestral sequences were combined for a given scenario. (a) Boxplot and solid splines are for 14 taxa scenarios of the three methods. Dashed lines and dotted lines are splines of ten and 18 taxa, respectively. (b) Direct comparison of the 14 taxa splines and medians of all three methods. The samples size of each scenario is 1,000. The accuracy of tree topologies decreases with more taxa and greater evolutionary distances between sequences. Trees calculated with secondary structures or doubled sequences show greater accuracy than those determined with normal sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2821295&req=5

Figure 2: Quartet distances values for equidistant trees. All five ancestral sequences were combined for a given scenario. (a) Boxplot and solid splines are for 14 taxa scenarios of the three methods. Dashed lines and dotted lines are splines of ten and 18 taxa, respectively. (b) Direct comparison of the 14 taxa splines and medians of all three methods. The samples size of each scenario is 1,000. The accuracy of tree topologies decreases with more taxa and greater evolutionary distances between sequences. Trees calculated with secondary structures or doubled sequences show greater accuracy than those determined with normal sequences.
Mentions: The shapes of bootstrap, Quartet distance and Robinson-Foulds distance distributions were similar for equidistant and variable distance trees. However, the branches of the trees for each underlying data set (sequence, sequence-structure and doubled sequence) received higher bootstrap support values and fewer false splits with constant branch lengths compared to variable distances, though differences were minimal (Figs. 1, 2, 3 and 4). Only Quartet distances are shown, since they are congruent with the results of the Robinson-Foulds distance (Additional file 1). Additionally, we included a relative per-branch representation of accuracy divided by the number of internal nodes in the Additional file 1.

Bottom Line: An extensive evaluation of the benefits of secondary structure, however, is lacking.The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness.Koonin.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics, University of Würzburg, Am Hubland, 97074 Würzburg, Germany.

ABSTRACT

Background: In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking.

Results: This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness.

Conclusions: Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion.

Reviewers: This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin.

Open peer review: Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.

Show MeSH