Probabilistic phylogenetic inference with insertions and deletions.
Bottom Line:
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time.However, the most widely used phylogenetic models only account for residue substitution events.We apply this model to phylogenetic tree inference by extending the program dnaml in phylip.
View Article:
PubMed Central - PubMed
Affiliation: Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America. rivase@janelia.hhmi.org
ABSTRACT
Show MeSH
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm. Related in: MedlinePlus |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC2527138&req=5
Mentions: If node k is not a leaf, for a residue i,(20)for a gap,(21)where and are the two daughters of node k, , and are the distances from node k to its left and right child respectively, and where the probabilities for the daughter nodes have already been calculated by the recursion. uk stands for the subset of leaves under node k for column u, and uk = – indicates that all leaves under node k are gaps for column u. The single-event conditional probabilities are dictated by the generative model in Equation 12 as,(22)(23)(24)for 1≤i,j≤K, where the functions γt, ξt and are given by the Markov model solutions (Equations 6, 7, and 9). Figure 1 shows a graphical interpretation of the Felsenstein recursions described in Equations 20 and 21. |
View Article: PubMed Central - PubMed
Affiliation: Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America. rivase@janelia.hhmi.org