Limits...
Constructing a meaningful evolutionary average at the phylogenetic center of mass.

Stone EA, Sidow A - BMC Bioinformatics (2007)

Bottom Line: Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well.Other areas of potential impact include motif discovery and vaccine design.A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA. eric_stone@ncsu.edu

ABSTRACT

Background: As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence similarity is often used to illuminate important regions of the genome, but if the comparison is between closely related species, then similarity is the rule rather than the interesting exception. Furthermore, and perhaps worse yet, the contribution of a divergent third species may be masked by the strong similarity between the other two. Here we propose a remedy that weighs the contribution of each species according to its phylogenetic placement.

Results: We first solve the problem of summarizing data related by phylogeny, and we explain why an average should operate on the entire evolutionary trajectory that relates the data. This perspective leads to a new approach in which we define the average in terms of the phylogeny, using the data and a stochastic model to obtain a probability on evolutionary trajectories. With the assumption that the data evolve according to a Brownian motion process on the tree, we show that our evolutionary average can be computed as convex combination of the species data. Thus, our approach, called the BranchManager, defines both an average and a novel taxon weighting scheme. We compare the BranchManager to two other methods, demonstrating why it exhibits desirable properties. In doing so, we devise a framework for comparison and introduce the concept of a representative point at which the average is situated.

Conclusion: The BranchManager uses as its representative point the phylogenetic center of mass, a choice which has both intuitive and practical appeal. Because our average is intrinsic to both the dataset and to the phylogeny, we expect it and its corresponding weighting scheme to be useful in all sorts of studies where interspecies data need to be combined. Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well. Other areas of potential impact include motif discovery and vaccine design. A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

Show MeSH
Relationships among eleven HIV-1 isolates. (a) A phylogeny, adapted from [20]. Branches are drawn to scale in units of percent divergence from the root. The isolates were taken from individuals in the United States (PV22, BH10, BRU, HXB, SF2, CDC), Haiti (WMJ2, RF), and Africa (ELI, MAL, Z3). (b) Sets of weights for each of the eleven isolates, calculated from the phylogeny in (a) using ACL (left) and BM (right). Ratio is the BM weight divided by the ACL weight; only Z3 has a ratio smaller than one. (c) Alignment of amino acid positions 307–321 of the HIV-1 envelope glycoprotein gp120 obtained from each of the eleven isolates.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1919398&req=5

Figure 5: Relationships among eleven HIV-1 isolates. (a) A phylogeny, adapted from [20]. Branches are drawn to scale in units of percent divergence from the root. The isolates were taken from individuals in the United States (PV22, BH10, BRU, HXB, SF2, CDC), Haiti (WMJ2, RF), and Africa (ELI, MAL, Z3). (b) Sets of weights for each of the eleven isolates, calculated from the phylogeny in (a) using ACL (left) and BM (right). Ratio is the BM weight divided by the ACL weight; only Z3 has a ratio smaller than one. (c) Alignment of amino acid positions 307–321 of the HIV-1 envelope glycoprotein gp120 obtained from each of the eleven isolates.

Mentions: The theoretical situation in Figure 4 is not particularly unrealistic. Real phylogenies are often severely imbalanced, including the very example used to demonstrate the ACL method [3]. We illustrate our method with a subtree of eleven HIV-1 isolates taken from the full phylogeny of fifteen isolates originally presented in [20] (Figure 5a). The isolate Z3 traces its lineage directly to the phylogenetic root; consequently, Z3 carries valuable, non-redundant information about the characteristics of the ancestral virus. The ACL approach reflects this, as the weight of Z3 is 0.3413 (Figure 5b); in other words, focused at the root, Z3 contributes over 34% to the characteristics of the average isolate. But the root may not representative, and in this example most of the diversity is far from the root and Z3. Representation at the phylogenetic center of mass mitigates the influence of Z3, simultaneously rewarding its non-redundancy while penalizing its divergence from the bulk of the remaining isolates [see Additional file 1]. Our approach reduces the weight of Z3 to 0.1695 while upweighting the remaining ten isolates relative to ACL (Figure 5b), leading to an average that better reflects the diversity in the study.


Constructing a meaningful evolutionary average at the phylogenetic center of mass.

Stone EA, Sidow A - BMC Bioinformatics (2007)

Relationships among eleven HIV-1 isolates. (a) A phylogeny, adapted from [20]. Branches are drawn to scale in units of percent divergence from the root. The isolates were taken from individuals in the United States (PV22, BH10, BRU, HXB, SF2, CDC), Haiti (WMJ2, RF), and Africa (ELI, MAL, Z3). (b) Sets of weights for each of the eleven isolates, calculated from the phylogeny in (a) using ACL (left) and BM (right). Ratio is the BM weight divided by the ACL weight; only Z3 has a ratio smaller than one. (c) Alignment of amino acid positions 307–321 of the HIV-1 envelope glycoprotein gp120 obtained from each of the eleven isolates.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1919398&req=5

Figure 5: Relationships among eleven HIV-1 isolates. (a) A phylogeny, adapted from [20]. Branches are drawn to scale in units of percent divergence from the root. The isolates were taken from individuals in the United States (PV22, BH10, BRU, HXB, SF2, CDC), Haiti (WMJ2, RF), and Africa (ELI, MAL, Z3). (b) Sets of weights for each of the eleven isolates, calculated from the phylogeny in (a) using ACL (left) and BM (right). Ratio is the BM weight divided by the ACL weight; only Z3 has a ratio smaller than one. (c) Alignment of amino acid positions 307–321 of the HIV-1 envelope glycoprotein gp120 obtained from each of the eleven isolates.
Mentions: The theoretical situation in Figure 4 is not particularly unrealistic. Real phylogenies are often severely imbalanced, including the very example used to demonstrate the ACL method [3]. We illustrate our method with a subtree of eleven HIV-1 isolates taken from the full phylogeny of fifteen isolates originally presented in [20] (Figure 5a). The isolate Z3 traces its lineage directly to the phylogenetic root; consequently, Z3 carries valuable, non-redundant information about the characteristics of the ancestral virus. The ACL approach reflects this, as the weight of Z3 is 0.3413 (Figure 5b); in other words, focused at the root, Z3 contributes over 34% to the characteristics of the average isolate. But the root may not representative, and in this example most of the diversity is far from the root and Z3. Representation at the phylogenetic center of mass mitigates the influence of Z3, simultaneously rewarding its non-redundancy while penalizing its divergence from the bulk of the remaining isolates [see Additional file 1]. Our approach reduces the weight of Z3 to 0.1695 while upweighting the remaining ten isolates relative to ACL (Figure 5b), leading to an average that better reflects the diversity in the study.

Bottom Line: Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well.Other areas of potential impact include motif discovery and vaccine design.A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA. eric_stone@ncsu.edu

ABSTRACT

Background: As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence similarity is often used to illuminate important regions of the genome, but if the comparison is between closely related species, then similarity is the rule rather than the interesting exception. Furthermore, and perhaps worse yet, the contribution of a divergent third species may be masked by the strong similarity between the other two. Here we propose a remedy that weighs the contribution of each species according to its phylogenetic placement.

Results: We first solve the problem of summarizing data related by phylogeny, and we explain why an average should operate on the entire evolutionary trajectory that relates the data. This perspective leads to a new approach in which we define the average in terms of the phylogeny, using the data and a stochastic model to obtain a probability on evolutionary trajectories. With the assumption that the data evolve according to a Brownian motion process on the tree, we show that our evolutionary average can be computed as convex combination of the species data. Thus, our approach, called the BranchManager, defines both an average and a novel taxon weighting scheme. We compare the BranchManager to two other methods, demonstrating why it exhibits desirable properties. In doing so, we devise a framework for comparison and introduce the concept of a representative point at which the average is situated.

Conclusion: The BranchManager uses as its representative point the phylogenetic center of mass, a choice which has both intuitive and practical appeal. Because our average is intrinsic to both the dataset and to the phylogeny, we expect it and its corresponding weighting scheme to be useful in all sorts of studies where interspecies data need to be combined. Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well. Other areas of potential impact include motif discovery and vaccine design. A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

Show MeSH