Limits...
Probabilistic phylogenetic inference with insertions and deletions.

Rivas E, Eddy SR - PLoS Comput. Biol. (2008)

Bottom Line: A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time.However, the most widely used phylogenetic models only account for residue substitution events.We apply this model to phylogenetic tree inference by extending the program dnaml in phylip.

View Article: PubMed Central - PubMed

Affiliation: Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America. rivase@janelia.hhmi.org

ABSTRACT
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.

Show MeSH

Related in: MedlinePlus

Felsenstein's peeling algorithm extended to gaps.Graphical description of the extended Felsenstein algorithm to calculate the probability of an alignment given a phylogenetic tree using the generative model for gaps presented in this work, given by Equations 20 and 21. For a given column in a multiple alignment, recursion A corresponds to the probability of a tree up to node k with a residue at the node (Equation 20). Recursion B corresponds to the probability of a tree up to node k with a gap at the node (Equation. 21). The algorithm needs to consider only evolutionarily correct events, thus recursion A includes substitutions and deletions but no insertions, and recursion B has to include one and only one insertion. In the case of having a residue i at node k (recursion A), the A1 term corresponds to the original substitution-only Felsenstein algorithm, where q and s are residues, and ∑q and ∑s stand for the sum to all possible residue substitutions. The terms A2 and A3 represent a deletion occurring for the right and left child respectively (and a substitution for the other child node). Because no insertion can occur in this recursion, there are no evolutionary events happening for descendants for the child that suffers the deletion. Thus, for a A2 or A3 term to have a contribution all nodes down to the leaves for the child that suffers the deletion have to be gaps. (We represent with gray the situation of no evolutionary event happening, and with an empty gray triangle with gaps at the bottom the situation of gaps at all nodes including the leaves for a given subtree.) The A4 term corresponds to a deletion for both children nodes, and no evolutionary event from there on. A A4 term contributes only if with all the leaves under node k are gaps. In the case of having a gap at node k (recursion B), the insertion might occur for the left or right child, which is represented by graphs B1 and B2 respectively. The insertion might instead be delayed to a node under the left or right child, which is represented by graphs B3 and B4 respectively. In all four cases, the child node for which no insertion occurs does not have any evolutionary events, and has to include all gaps at the internal nodes and leaves. The substitution, deletion and insertion probabilities are given by the generative model as per Equations 22–24, respectively.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2527138&req=5

pcbi-1000172-g001: Felsenstein's peeling algorithm extended to gaps.Graphical description of the extended Felsenstein algorithm to calculate the probability of an alignment given a phylogenetic tree using the generative model for gaps presented in this work, given by Equations 20 and 21. For a given column in a multiple alignment, recursion A corresponds to the probability of a tree up to node k with a residue at the node (Equation 20). Recursion B corresponds to the probability of a tree up to node k with a gap at the node (Equation. 21). The algorithm needs to consider only evolutionarily correct events, thus recursion A includes substitutions and deletions but no insertions, and recursion B has to include one and only one insertion. In the case of having a residue i at node k (recursion A), the A1 term corresponds to the original substitution-only Felsenstein algorithm, where q and s are residues, and ∑q and ∑s stand for the sum to all possible residue substitutions. The terms A2 and A3 represent a deletion occurring for the right and left child respectively (and a substitution for the other child node). Because no insertion can occur in this recursion, there are no evolutionary events happening for descendants for the child that suffers the deletion. Thus, for a A2 or A3 term to have a contribution all nodes down to the leaves for the child that suffers the deletion have to be gaps. (We represent with gray the situation of no evolutionary event happening, and with an empty gray triangle with gaps at the bottom the situation of gaps at all nodes including the leaves for a given subtree.) The A4 term corresponds to a deletion for both children nodes, and no evolutionary event from there on. A A4 term contributes only if with all the leaves under node k are gaps. In the case of having a gap at node k (recursion B), the insertion might occur for the left or right child, which is represented by graphs B1 and B2 respectively. The insertion might instead be delayed to a node under the left or right child, which is represented by graphs B3 and B4 respectively. In all four cases, the child node for which no insertion occurs does not have any evolutionary events, and has to include all gaps at the internal nodes and leaves. The substitution, deletion and insertion probabilities are given by the generative model as per Equations 22–24, respectively.

Mentions: If node k is not a leaf, for a residue i,(20)for a gap,(21)where and are the two daughters of node k, , and are the distances from node k to its left and right child respectively, and where the probabilities for the daughter nodes have already been calculated by the recursion. uk stands for the subset of leaves under node k for column u, and uk = – indicates that all leaves under node k are gaps for column u. The single-event conditional probabilities are dictated by the generative model in Equation 12 as,(22)(23)(24)for 1≤i,j≤K, where the functions γt, ξt and are given by the Markov model solutions (Equations 6, 7, and 9). Figure 1 shows a graphical interpretation of the Felsenstein recursions described in Equations 20 and 21.


Probabilistic phylogenetic inference with insertions and deletions.

Rivas E, Eddy SR - PLoS Comput. Biol. (2008)

Felsenstein's peeling algorithm extended to gaps.Graphical description of the extended Felsenstein algorithm to calculate the probability of an alignment given a phylogenetic tree using the generative model for gaps presented in this work, given by Equations 20 and 21. For a given column in a multiple alignment, recursion A corresponds to the probability of a tree up to node k with a residue at the node (Equation 20). Recursion B corresponds to the probability of a tree up to node k with a gap at the node (Equation. 21). The algorithm needs to consider only evolutionarily correct events, thus recursion A includes substitutions and deletions but no insertions, and recursion B has to include one and only one insertion. In the case of having a residue i at node k (recursion A), the A1 term corresponds to the original substitution-only Felsenstein algorithm, where q and s are residues, and ∑q and ∑s stand for the sum to all possible residue substitutions. The terms A2 and A3 represent a deletion occurring for the right and left child respectively (and a substitution for the other child node). Because no insertion can occur in this recursion, there are no evolutionary events happening for descendants for the child that suffers the deletion. Thus, for a A2 or A3 term to have a contribution all nodes down to the leaves for the child that suffers the deletion have to be gaps. (We represent with gray the situation of no evolutionary event happening, and with an empty gray triangle with gaps at the bottom the situation of gaps at all nodes including the leaves for a given subtree.) The A4 term corresponds to a deletion for both children nodes, and no evolutionary event from there on. A A4 term contributes only if with all the leaves under node k are gaps. In the case of having a gap at node k (recursion B), the insertion might occur for the left or right child, which is represented by graphs B1 and B2 respectively. The insertion might instead be delayed to a node under the left or right child, which is represented by graphs B3 and B4 respectively. In all four cases, the child node for which no insertion occurs does not have any evolutionary events, and has to include all gaps at the internal nodes and leaves. The substitution, deletion and insertion probabilities are given by the generative model as per Equations 22–24, respectively.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2527138&req=5

pcbi-1000172-g001: Felsenstein's peeling algorithm extended to gaps.Graphical description of the extended Felsenstein algorithm to calculate the probability of an alignment given a phylogenetic tree using the generative model for gaps presented in this work, given by Equations 20 and 21. For a given column in a multiple alignment, recursion A corresponds to the probability of a tree up to node k with a residue at the node (Equation 20). Recursion B corresponds to the probability of a tree up to node k with a gap at the node (Equation. 21). The algorithm needs to consider only evolutionarily correct events, thus recursion A includes substitutions and deletions but no insertions, and recursion B has to include one and only one insertion. In the case of having a residue i at node k (recursion A), the A1 term corresponds to the original substitution-only Felsenstein algorithm, where q and s are residues, and ∑q and ∑s stand for the sum to all possible residue substitutions. The terms A2 and A3 represent a deletion occurring for the right and left child respectively (and a substitution for the other child node). Because no insertion can occur in this recursion, there are no evolutionary events happening for descendants for the child that suffers the deletion. Thus, for a A2 or A3 term to have a contribution all nodes down to the leaves for the child that suffers the deletion have to be gaps. (We represent with gray the situation of no evolutionary event happening, and with an empty gray triangle with gaps at the bottom the situation of gaps at all nodes including the leaves for a given subtree.) The A4 term corresponds to a deletion for both children nodes, and no evolutionary event from there on. A A4 term contributes only if with all the leaves under node k are gaps. In the case of having a gap at node k (recursion B), the insertion might occur for the left or right child, which is represented by graphs B1 and B2 respectively. The insertion might instead be delayed to a node under the left or right child, which is represented by graphs B3 and B4 respectively. In all four cases, the child node for which no insertion occurs does not have any evolutionary events, and has to include all gaps at the internal nodes and leaves. The substitution, deletion and insertion probabilities are given by the generative model as per Equations 22–24, respectively.
Mentions: If node k is not a leaf, for a residue i,(20)for a gap,(21)where and are the two daughters of node k, , and are the distances from node k to its left and right child respectively, and where the probabilities for the daughter nodes have already been calculated by the recursion. uk stands for the subset of leaves under node k for column u, and uk = – indicates that all leaves under node k are gaps for column u. The single-event conditional probabilities are dictated by the generative model in Equation 12 as,(22)(23)(24)for 1≤i,j≤K, where the functions γt, ξt and are given by the Markov model solutions (Equations 6, 7, and 9). Figure 1 shows a graphical interpretation of the Felsenstein recursions described in Equations 20 and 21.

Bottom Line: A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time.However, the most widely used phylogenetic models only account for residue substitution events.We apply this model to phylogenetic tree inference by extending the program dnaml in phylip.

View Article: PubMed Central - PubMed

Affiliation: Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America. rivase@janelia.hhmi.org

ABSTRACT
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.

Show MeSH
Related in: MedlinePlus