Limits...
Constructing a meaningful evolutionary average at the phylogenetic center of mass.

Stone EA, Sidow A - BMC Bioinformatics (2007)

Bottom Line: Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well.Other areas of potential impact include motif discovery and vaccine design.A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA. eric_stone@ncsu.edu

ABSTRACT

Background: As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence similarity is often used to illuminate important regions of the genome, but if the comparison is between closely related species, then similarity is the rule rather than the interesting exception. Furthermore, and perhaps worse yet, the contribution of a divergent third species may be masked by the strong similarity between the other two. Here we propose a remedy that weighs the contribution of each species according to its phylogenetic placement.

Results: We first solve the problem of summarizing data related by phylogeny, and we explain why an average should operate on the entire evolutionary trajectory that relates the data. This perspective leads to a new approach in which we define the average in terms of the phylogeny, using the data and a stochastic model to obtain a probability on evolutionary trajectories. With the assumption that the data evolve according to a Brownian motion process on the tree, we show that our evolutionary average can be computed as convex combination of the species data. Thus, our approach, called the BranchManager, defines both an average and a novel taxon weighting scheme. We compare the BranchManager to two other methods, demonstrating why it exhibits desirable properties. In doing so, we devise a framework for comparison and introduce the concept of a representative point at which the average is situated.

Conclusion: The BranchManager uses as its representative point the phylogenetic center of mass, a choice which has both intuitive and practical appeal. Because our average is intrinsic to both the dataset and to the phylogeny, we expect it and its corresponding weighting scheme to be useful in all sorts of studies where interspecies data need to be combined. Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well. Other areas of potential impact include motif discovery and vaccine design. A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

Show MeSH
Representative points of a three-taxon tree for various weighting schemes. Solid black lines drawn as x-, y-, and z-axes show the branches of the tree that join species 1, 2, and 3 (white numbers atop black circles) to the phylogenetic root at (0,0,0). Black rectangles at coordinates (x,y,z) indicate representative points for weighting schemes ACL, BM, and VA.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1919398&req=5

Figure 3: Representative points of a three-taxon tree for various weighting schemes. Solid black lines drawn as x-, y-, and z-axes show the branches of the tree that join species 1, 2, and 3 (white numbers atop black circles) to the phylogenetic root at (0,0,0). Black rectangles at coordinates (x,y,z) indicate representative points for weighting schemes ACL, BM, and VA.

Mentions: When considering the taxonomic distribution of an interspecies dataset, there are minimally two issues to consider: (1) the placement of the taxa relative to each other, and (2) the placement of the taxa relative to the evolutionary trajectory described by the tree (see Figure 3). The relative placement of the taxa to each other can be summarized by the matrix D of pairwise distances whose entries Dij record the phylogenetic branch length, and hence the evolutionary divergence, separating species i and j. The relative placement of the taxa on the tree can be summarized by D in concert with an additional vector z whose entries zi record a measure of evolutionary divergence between species i and an arbitrary reference that we call the representative point. In the ACL approach, zi is the divergence (total branch length) between species i and the common ancestor at the root at the tree; by contrast, the distance-based VA approach has no explicit representative point and accepts any z proportional to the vector 1. In both cases, the weight vectors can be shown to satisfy the linear relation Dw - z = c1, where c is a normalizing constant so that the weights sum to one. The solution to this equation gives the weights as


Constructing a meaningful evolutionary average at the phylogenetic center of mass.

Stone EA, Sidow A - BMC Bioinformatics (2007)

Representative points of a three-taxon tree for various weighting schemes. Solid black lines drawn as x-, y-, and z-axes show the branches of the tree that join species 1, 2, and 3 (white numbers atop black circles) to the phylogenetic root at (0,0,0). Black rectangles at coordinates (x,y,z) indicate representative points for weighting schemes ACL, BM, and VA.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1919398&req=5

Figure 3: Representative points of a three-taxon tree for various weighting schemes. Solid black lines drawn as x-, y-, and z-axes show the branches of the tree that join species 1, 2, and 3 (white numbers atop black circles) to the phylogenetic root at (0,0,0). Black rectangles at coordinates (x,y,z) indicate representative points for weighting schemes ACL, BM, and VA.
Mentions: When considering the taxonomic distribution of an interspecies dataset, there are minimally two issues to consider: (1) the placement of the taxa relative to each other, and (2) the placement of the taxa relative to the evolutionary trajectory described by the tree (see Figure 3). The relative placement of the taxa to each other can be summarized by the matrix D of pairwise distances whose entries Dij record the phylogenetic branch length, and hence the evolutionary divergence, separating species i and j. The relative placement of the taxa on the tree can be summarized by D in concert with an additional vector z whose entries zi record a measure of evolutionary divergence between species i and an arbitrary reference that we call the representative point. In the ACL approach, zi is the divergence (total branch length) between species i and the common ancestor at the root at the tree; by contrast, the distance-based VA approach has no explicit representative point and accepts any z proportional to the vector 1. In both cases, the weight vectors can be shown to satisfy the linear relation Dw - z = c1, where c is a normalizing constant so that the weights sum to one. The solution to this equation gives the weights as

Bottom Line: Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well.Other areas of potential impact include motif discovery and vaccine design.A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA. eric_stone@ncsu.edu

ABSTRACT

Background: As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence similarity is often used to illuminate important regions of the genome, but if the comparison is between closely related species, then similarity is the rule rather than the interesting exception. Furthermore, and perhaps worse yet, the contribution of a divergent third species may be masked by the strong similarity between the other two. Here we propose a remedy that weighs the contribution of each species according to its phylogenetic placement.

Results: We first solve the problem of summarizing data related by phylogeny, and we explain why an average should operate on the entire evolutionary trajectory that relates the data. This perspective leads to a new approach in which we define the average in terms of the phylogeny, using the data and a stochastic model to obtain a probability on evolutionary trajectories. With the assumption that the data evolve according to a Brownian motion process on the tree, we show that our evolutionary average can be computed as convex combination of the species data. Thus, our approach, called the BranchManager, defines both an average and a novel taxon weighting scheme. We compare the BranchManager to two other methods, demonstrating why it exhibits desirable properties. In doing so, we devise a framework for comparison and introduce the concept of a representative point at which the average is situated.

Conclusion: The BranchManager uses as its representative point the phylogenetic center of mass, a choice which has both intuitive and practical appeal. Because our average is intrinsic to both the dataset and to the phylogeny, we expect it and its corresponding weighting scheme to be useful in all sorts of studies where interspecies data need to be combined. Obvious applications include evolutionary studies of morphology, physiology or behaviour, but quantitative measures such as sequence hydrophobicity and gene expression level are amenable to our approach as well. Other areas of potential impact include motif discovery and vaccine design. A Java implementation of the BranchManager is available for download, as is a script written in the statistical language R.

Show MeSH