Limits...
Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH - PLoS ONE (2013)

Bottom Line: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation.These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify.Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies.

View Article: PubMed Central - PubMed

Affiliation: British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada ; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

ABSTRACT
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

Show MeSH

Related in: MedlinePlus

A visualization of RNA virus phylogenies in the tree shape kernel space (, ) using t-distributed stochastic neighbor embedding (t-SNE).The t-SNE algorithm attempts to find the optimal map of high-dimensional data into a low-dimensional space while preserving the distances among points as much as possible. Thus, the distance between pair of viruses or virus clades (labelled by the same abbreviations as Figure 4) is approximately proportional to their mean kernel distance. Groups of virus clades of particular interest are highlighted with the corresponding colours: HIV, red; HCV, yellow; Dengue (DEN), green; IAV-H3, IAV-H1, and IBV (blue).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3815201&req=5

pone-0078122-g005: A visualization of RNA virus phylogenies in the tree shape kernel space (, ) using t-distributed stochastic neighbor embedding (t-SNE).The t-SNE algorithm attempts to find the optimal map of high-dimensional data into a low-dimensional space while preserving the distances among points as much as possible. Thus, the distance between pair of viruses or virus clades (labelled by the same abbreviations as Figure 4) is approximately proportional to their mean kernel distance. Groups of virus clades of particular interest are highlighted with the corresponding colours: HIV, red; HCV, yellow; Dengue (DEN), green; IAV-H3, IAV-H1, and IBV (blue).

Mentions: A scatterplot of the largest 2 principal components from the preceding analysis described a single arc in which HIV and HCV phylogenies comprised a distinct cluster from the other RNA viruses (Figure S2). However, the majority of the other viruses were agglomerated at the base of the arc, making it difficult to discern patterns from this visualization. This is a known issue in PCA in which the largest components are assumed to represent the important structure in the data whereas smaller components represent noise. To provide a clearer visualization, we generated another scatterplot (Figure 5) using a t-distributed stochastic neighbor embedding algorithm (t-SNE) which attempts to preserve both global and local structure in a low-dimensional visualization of high-dimensional data [40]. Again, HIV and HCV comprised a distinct cluster in this visualization. Phylogenies derived from these viruses tend to feature long terminal branches and relatively short internal branches (Figure 1), which has been attributed to the exponential spread of these viruses and their propensity to establish persistent infections. This outcome was robust to partitioning sequences by clade; for example, a phylogeny comprised of all three HIV subtypes in our study mapped adjacent to the HIV subtype-specific phylogenies in both types of projections (data not shown).


Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses.

Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH - PLoS ONE (2013)

A visualization of RNA virus phylogenies in the tree shape kernel space (, ) using t-distributed stochastic neighbor embedding (t-SNE).The t-SNE algorithm attempts to find the optimal map of high-dimensional data into a low-dimensional space while preserving the distances among points as much as possible. Thus, the distance between pair of viruses or virus clades (labelled by the same abbreviations as Figure 4) is approximately proportional to their mean kernel distance. Groups of virus clades of particular interest are highlighted with the corresponding colours: HIV, red; HCV, yellow; Dengue (DEN), green; IAV-H3, IAV-H1, and IBV (blue).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3815201&req=5

pone-0078122-g005: A visualization of RNA virus phylogenies in the tree shape kernel space (, ) using t-distributed stochastic neighbor embedding (t-SNE).The t-SNE algorithm attempts to find the optimal map of high-dimensional data into a low-dimensional space while preserving the distances among points as much as possible. Thus, the distance between pair of viruses or virus clades (labelled by the same abbreviations as Figure 4) is approximately proportional to their mean kernel distance. Groups of virus clades of particular interest are highlighted with the corresponding colours: HIV, red; HCV, yellow; Dengue (DEN), green; IAV-H3, IAV-H1, and IBV (blue).
Mentions: A scatterplot of the largest 2 principal components from the preceding analysis described a single arc in which HIV and HCV phylogenies comprised a distinct cluster from the other RNA viruses (Figure S2). However, the majority of the other viruses were agglomerated at the base of the arc, making it difficult to discern patterns from this visualization. This is a known issue in PCA in which the largest components are assumed to represent the important structure in the data whereas smaller components represent noise. To provide a clearer visualization, we generated another scatterplot (Figure 5) using a t-distributed stochastic neighbor embedding algorithm (t-SNE) which attempts to preserve both global and local structure in a low-dimensional visualization of high-dimensional data [40]. Again, HIV and HCV comprised a distinct cluster in this visualization. Phylogenies derived from these viruses tend to feature long terminal branches and relatively short internal branches (Figure 1), which has been attributed to the exponential spread of these viruses and their propensity to establish persistent infections. This outcome was robust to partitioning sequences by clade; for example, a phylogeny comprised of all three HIV subtypes in our study mapped adjacent to the HIV subtype-specific phylogenies in both types of projections (data not shown).

Bottom Line: A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation.These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify.Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies.

View Article: PubMed Central - PubMed

Affiliation: British Columbia Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada ; Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.

ABSTRACT
A phylogeny is a tree-based model of common ancestry that is an indispensable tool for studying biological variation. Phylogenies play a special role in the study of rapidly evolving populations such as viruses, where the proliferation of lineages is constantly being shaped by the mode of virus transmission, by adaptation to immune systems, and by patterns of human migration and contact. These processes may leave an imprint on the shapes of virus phylogenies that can be extracted for comparative study; however, tree shapes are intrinsically difficult to quantify. Here we present a comprehensive study of phylogenies reconstructed from 38 different RNA viruses from 12 taxonomic families that are associated with human pathologies. To accomplish this, we have developed a new procedure for studying phylogenetic tree shapes based on the 'kernel trick', a technique that maps complex objects into a statistically convenient space. We show that our kernel method outperforms nine different tree balance statistics at correctly classifying phylogenies that were simulated under different evolutionary scenarios. Using the kernel method, we observe patterns in the distribution of RNA virus phylogenies in this space that reflect modes of transmission and pathogenesis. For example, viruses that can establish persistent chronic infections (such as HIV and hepatitis C virus) form a distinct cluster. Although the visibly 'star-like' shape characteristic of trees from these viruses has been well-documented, we show that established methods for quantifying tree shape fail to distinguish these trees from those of other viruses. The kernel approach presented here potentially represents an important new tool for characterizing the evolution and epidemiology of RNA viruses.

Show MeSH
Related in: MedlinePlus