Limits...
Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH - PLoS ONE (2013)

Bottom Line: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation.These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify.Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies.

View Article: PubMed Central - PubMed

Affiliation: British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada ; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

ABSTRACT
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

Show MeSH

Related in: MedlinePlus

Kernel-assisted comparison of two tree shapes.For trees comprising  and  nodes, respectively, there are  pairs of nodes to evaluate. (A) Starting from a given pair of nodes (indicated in figure by circles with double-outlines), the algorithm finds the largest common subset tree rooted at these nodes. First, we find that for both nodes, neither of the branches terminate at a ‘leaf node’ (marked with ‘‘). This match contributes a relatively small amount to our kernel score, not only because the matching subset trees (highlighted in thick blue lines) comprise only one node each, but also because their discordant branch lengths lead to a substantial penalty. (B) Next, we descend down the left branch in both trees. The current nodes (open circles) in both trees spawn one leaf node and one internal node; therefore, the subset trees continue to match. In addition, their branch lengths are similar, so their contribution to the cumulative kernel score is given greater weight. (C) Finally, we descend down the right branch in both trees and find that the subset trees no longer match beyond this point. We also proceed down the right branch of the reference nodes and find no match, so our traversal of the two trees from these nodes is complete and we restart our search at the next pair of nodes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3815201&req=5

pone-0078122-g002: Kernel-assisted comparison of two tree shapes.For trees comprising and nodes, respectively, there are pairs of nodes to evaluate. (A) Starting from a given pair of nodes (indicated in figure by circles with double-outlines), the algorithm finds the largest common subset tree rooted at these nodes. First, we find that for both nodes, neither of the branches terminate at a ‘leaf node’ (marked with ‘‘). This match contributes a relatively small amount to our kernel score, not only because the matching subset trees (highlighted in thick blue lines) comprise only one node each, but also because their discordant branch lengths lead to a substantial penalty. (B) Next, we descend down the left branch in both trees. The current nodes (open circles) in both trees spawn one leaf node and one internal node; therefore, the subset trees continue to match. In addition, their branch lengths are similar, so their contribution to the cumulative kernel score is given greater weight. (C) Finally, we descend down the right branch in both trees and find that the subset trees no longer match beyond this point. We also proceed down the right branch of the reference nodes and find no match, so our traversal of the two trees from these nodes is complete and we restart our search at the next pair of nodes.

Mentions: To construct a kernel function on phylogenetic tree shapes, we adapted a natural language processing kernel function [16] that was originally designed to classify text on the basis of its syntactic structure (a generative tree in which words descend from linguistic precursors [15]). Our modified kernel function extracts all the subset trees that are the common features of two phylogenetic trees and (Figure 2). A subset tree is a contiguous collection of descendants of a specific node, but unlike a subtree, it does not necessarily include all of the descendants. Thus, our approach is similar to the Robinson-Foulds metric [14] which compares alternative trees for a given set of taxa by counting the number of subtrees in common. Unlike the Robinson-Foulds metric, however, our kernel function not only allows us to compare trees relating different numbers and kinds of taxa, but it also accounts for differences in the branch lengths between matching subset trees. We assess the performance of our kernel function against nine measures of tree shape at classifying simulated phylogenies with varying rates of speciation, and then apply our function to a large collection of human and zoonotic RNA virus phylogenies.


Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH - PLoS ONE (2013)

Kernel-assisted comparison of two tree shapes.For trees comprising  and  nodes, respectively, there are  pairs of nodes to evaluate. (A) Starting from a given pair of nodes (indicated in figure by circles with double-outlines), the algorithm finds the largest common subset tree rooted at these nodes. First, we find that for both nodes, neither of the branches terminate at a ‘leaf node’ (marked with ‘‘). This match contributes a relatively small amount to our kernel score, not only because the matching subset trees (highlighted in thick blue lines) comprise only one node each, but also because their discordant branch lengths lead to a substantial penalty. (B) Next, we descend down the left branch in both trees. The current nodes (open circles) in both trees spawn one leaf node and one internal node; therefore, the subset trees continue to match. In addition, their branch lengths are similar, so their contribution to the cumulative kernel score is given greater weight. (C) Finally, we descend down the right branch in both trees and find that the subset trees no longer match beyond this point. We also proceed down the right branch of the reference nodes and find no match, so our traversal of the two trees from these nodes is complete and we restart our search at the next pair of nodes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3815201&req=5

pone-0078122-g002: Kernel-assisted comparison of two tree shapes.For trees comprising and nodes, respectively, there are pairs of nodes to evaluate. (A) Starting from a given pair of nodes (indicated in figure by circles with double-outlines), the algorithm finds the largest common subset tree rooted at these nodes. First, we find that for both nodes, neither of the branches terminate at a ‘leaf node’ (marked with ‘‘). This match contributes a relatively small amount to our kernel score, not only because the matching subset trees (highlighted in thick blue lines) comprise only one node each, but also because their discordant branch lengths lead to a substantial penalty. (B) Next, we descend down the left branch in both trees. The current nodes (open circles) in both trees spawn one leaf node and one internal node; therefore, the subset trees continue to match. In addition, their branch lengths are similar, so their contribution to the cumulative kernel score is given greater weight. (C) Finally, we descend down the right branch in both trees and find that the subset trees no longer match beyond this point. We also proceed down the right branch of the reference nodes and find no match, so our traversal of the two trees from these nodes is complete and we restart our search at the next pair of nodes.
Mentions: To construct a kernel function on phylogenetic tree shapes, we adapted a natural language processing kernel function [16] that was originally designed to classify text on the basis of its syntactic structure (a generative tree in which words descend from linguistic precursors [15]). Our modified kernel function extracts all the subset trees that are the common features of two phylogenetic trees and (Figure 2). A subset tree is a contiguous collection of descendants of a specific node, but unlike a subtree, it does not necessarily include all of the descendants. Thus, our approach is similar to the Robinson-Foulds metric [14] which compares alternative trees for a given set of taxa by counting the number of subtrees in common. Unlike the Robinson-Foulds metric, however, our kernel function not only allows us to compare trees relating different numbers and kinds of taxa, but it also accounts for differences in the branch lengths between matching subset trees. We assess the performance of our kernel function against nine measures of tree shape at classifying simulated phylogenies with varying rates of speciation, and then apply our function to a large collection of human and zoonotic RNA virus phylogenies.

Bottom Line: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation.These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify.Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies.

View Article: PubMed Central - PubMed

Affiliation: British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada ; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

ABSTRACT
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

Show MeSH
Related in: MedlinePlus