Limits...
Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH - PLoS ONE (2013)

Bottom Line: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation.These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify.Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies.

View Article: PubMed Central - PubMed

Affiliation: British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada ; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

ABSTRACT
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

Show MeSH

Related in: MedlinePlus

Classification of simulated phylogenies using nine balance statistics and the kernel function.We simulated the growth of two sets of 100 phylogenies relating 100 taxa under different scenarios in which rates of speciation (branching) evolved at different rates. Greater variation in speciation rates tended to produce more imbalanced trees. Nine different balance statistics, including eight from [12], were computed for all phylogenies: Colless’ index, Sackin’s index, the mean and variance in path lengths from tips to the root, Shao and Sokal’s  and  statistics, and the imbalance value () for the sum, total mean, and the mean of the earliest 10 internal nodes of the tree. This plot illustrates the trade-off between sensitivity and specificity of classifying phylogenies by applying a cutoff value each of these balance statistics. A single point (star) indicates the sensitivity and specificity attained by applying the phylogenetic kernel function (with  and ) to train a support vector machine (SVM) on a random subset (50%) of the phylogenies, and classifying the remaining half.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3815201&req=5

pone-0078122-g003: Classification of simulated phylogenies using nine balance statistics and the kernel function.We simulated the growth of two sets of 100 phylogenies relating 100 taxa under different scenarios in which rates of speciation (branching) evolved at different rates. Greater variation in speciation rates tended to produce more imbalanced trees. Nine different balance statistics, including eight from [12], were computed for all phylogenies: Colless’ index, Sackin’s index, the mean and variance in path lengths from tips to the root, Shao and Sokal’s and statistics, and the imbalance value () for the sum, total mean, and the mean of the earliest 10 internal nodes of the tree. This plot illustrates the trade-off between sensitivity and specificity of classifying phylogenies by applying a cutoff value each of these balance statistics. A single point (star) indicates the sensitivity and specificity attained by applying the phylogenetic kernel function (with and ) to train a support vector machine (SVM) on a random subset (50%) of the phylogenies, and classifying the remaining half.

Mentions: The sensitivity and specificity of classifying the simulated phylogenies by mutation rates ( = 0.01 and  = 0.1) is summarized in Figure 3. Using our kernel method, we obtained a median sensitivity of 97.7% (interquartile range, IQR = 95.6%, 98.0%) and specificity of 90.8% (IQR = 89.1%, 92.8%) when averaged over 100 replicate training sets. These results were unambiguously superior to all nine balance statistics that were evaluated over the same simulated data. For instance, the sum of Fusco and Cronk’s imbalance statistic was the most effective among the balance statistics, but none could exceed a sensitivity of 80% without a corresponding drop in specificity below 80%. Thus, the kernel method is capable of providing a substantial advantage for recognizing virus trees that have evolved under different evolutionary and epidemiological scenarios.


Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH - PLoS ONE (2013)

Classification of simulated phylogenies using nine balance statistics and the kernel function.We simulated the growth of two sets of 100 phylogenies relating 100 taxa under different scenarios in which rates of speciation (branching) evolved at different rates. Greater variation in speciation rates tended to produce more imbalanced trees. Nine different balance statistics, including eight from [12], were computed for all phylogenies: Colless’ index, Sackin’s index, the mean and variance in path lengths from tips to the root, Shao and Sokal’s  and  statistics, and the imbalance value () for the sum, total mean, and the mean of the earliest 10 internal nodes of the tree. This plot illustrates the trade-off between sensitivity and specificity of classifying phylogenies by applying a cutoff value each of these balance statistics. A single point (star) indicates the sensitivity and specificity attained by applying the phylogenetic kernel function (with  and ) to train a support vector machine (SVM) on a random subset (50%) of the phylogenies, and classifying the remaining half.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3815201&req=5

pone-0078122-g003: Classification of simulated phylogenies using nine balance statistics and the kernel function.We simulated the growth of two sets of 100 phylogenies relating 100 taxa under different scenarios in which rates of speciation (branching) evolved at different rates. Greater variation in speciation rates tended to produce more imbalanced trees. Nine different balance statistics, including eight from [12], were computed for all phylogenies: Colless’ index, Sackin’s index, the mean and variance in path lengths from tips to the root, Shao and Sokal’s and statistics, and the imbalance value () for the sum, total mean, and the mean of the earliest 10 internal nodes of the tree. This plot illustrates the trade-off between sensitivity and specificity of classifying phylogenies by applying a cutoff value each of these balance statistics. A single point (star) indicates the sensitivity and specificity attained by applying the phylogenetic kernel function (with and ) to train a support vector machine (SVM) on a random subset (50%) of the phylogenies, and classifying the remaining half.
Mentions: The sensitivity and specificity of classifying the simulated phylogenies by mutation rates ( = 0.01 and  = 0.1) is summarized in Figure 3. Using our kernel method, we obtained a median sensitivity of 97.7% (interquartile range, IQR = 95.6%, 98.0%) and specificity of 90.8% (IQR = 89.1%, 92.8%) when averaged over 100 replicate training sets. These results were unambiguously superior to all nine balance statistics that were evaluated over the same simulated data. For instance, the sum of Fusco and Cronk’s imbalance statistic was the most effective among the balance statistics, but none could exceed a sensitivity of 80% without a corresponding drop in specificity below 80%. Thus, the kernel method is capable of providing a substantial advantage for recognizing virus trees that have evolved under different evolutionary and epidemiological scenarios.

Bottom Line: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation.These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify.Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies.

View Article: PubMed Central - PubMed

Affiliation: British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada ; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

ABSTRACT
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

Show MeSH
Related in: MedlinePlus