Limits...
Probabilistic phylogenetic inference with insertions and deletions.

Rivas E, Eddy SR - PLoS Comput. Biol. (2008)

Bottom Line: A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time.However, the most widely used phylogenetic models only account for residue substitution events.We apply this model to phylogenetic tree inference by extending the program dnaml in phylip.

View Article: PubMed Central - PubMed

Affiliation: Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America. rivase@janelia.hhmi.org

ABSTRACT
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.

Show MeSH

Related in: MedlinePlus

Comparison of dnaml versus dnamlε using the “tree concordance test” on ribosomal RNA alignments.Tree concordance test for SSU (left) and LSU (right) rRNA alignments displayed as a function of the total fraction of gaps present in the alignment. We used five SSU and four LSU alignments described in Table 5. For each alignment, we randomly selected a large number of eight taxa alignments (4,000 for the Archaea and Chloroplasts alignments, and 10,000 for the Eukarya, Bacteria and Mitochondria alignments). Each eight taxa alignment was first shuffled and then split in two halves. The tree concordance test assesses the similarity between the two trees inferred for the two sections of the alignment. Three measures of tree similarity are displayed: a binary count of whether the trees are topologically identical or not (TP), the Symmetric Difference Distance (SDD) and the normalized Branch Scoring Distance (nBSD). Results for all SSU (LSU) tests have been summarized together. (A) A histogram of total alignments, as well as the number of TPs for dnaml (magenta) and dnamlε (cyan) as a function of the total fraction of gaps in the alignment. (B,C,D) Results for the fraction of TPs, the SDD, and the nBSD respectively. Overall tree concordance for SSU rRNA is 27.9% (10,589/38,000) for dnamlε, versus for 16.9% (6,418/38,000) dnaml. Overall tree concordance for LSU rRNA is 46.6% (13,048/28,000) for dnamlε , versus 35.7% (10,002/28,000) for dnaml. (E) shows a comparison of time performance. dnamlε shows on average a two to three fold time increase respect to dnaml for eight taxa alignments.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2527138&req=5

pcbi-1000172-g004: Comparison of dnaml versus dnamlε using the “tree concordance test” on ribosomal RNA alignments.Tree concordance test for SSU (left) and LSU (right) rRNA alignments displayed as a function of the total fraction of gaps present in the alignment. We used five SSU and four LSU alignments described in Table 5. For each alignment, we randomly selected a large number of eight taxa alignments (4,000 for the Archaea and Chloroplasts alignments, and 10,000 for the Eukarya, Bacteria and Mitochondria alignments). Each eight taxa alignment was first shuffled and then split in two halves. The tree concordance test assesses the similarity between the two trees inferred for the two sections of the alignment. Three measures of tree similarity are displayed: a binary count of whether the trees are topologically identical or not (TP), the Symmetric Difference Distance (SDD) and the normalized Branch Scoring Distance (nBSD). Results for all SSU (LSU) tests have been summarized together. (A) A histogram of total alignments, as well as the number of TPs for dnaml (magenta) and dnamlε (cyan) as a function of the total fraction of gaps in the alignment. (B,C,D) Results for the fraction of TPs, the SDD, and the nBSD respectively. Overall tree concordance for SSU rRNA is 27.9% (10,589/38,000) for dnamlε, versus for 16.9% (6,418/38,000) dnaml. Overall tree concordance for LSU rRNA is 46.6% (13,048/28,000) for dnamlε , versus 35.7% (10,002/28,000) for dnaml. (E) shows a comparison of time performance. dnamlε shows on average a two to three fold time increase respect to dnaml for eight taxa alignments.

Mentions: Results are summarized in Figure 4. Overall, dnamlε shows tree concordance of 27.9% for SSU and 46.6% for LSU, while dnaml shows tree concordance in 16.9% for SSU and 35.7% for LSU. The error estimate for all these results is about 0.5–0.6%, which indicates that the improvement obtained by dnamlε is significant. LSU alignments are longer than SSU (4205±1179 versus 1959±579), probably explaining the better performance. For alignments with few gaps, the two methods produce similar results. The improvement of dnamlε over dnaml increases with the frequency of gaps in the alignments.


Probabilistic phylogenetic inference with insertions and deletions.

Rivas E, Eddy SR - PLoS Comput. Biol. (2008)

Comparison of dnaml versus dnamlε using the “tree concordance test” on ribosomal RNA alignments.Tree concordance test for SSU (left) and LSU (right) rRNA alignments displayed as a function of the total fraction of gaps present in the alignment. We used five SSU and four LSU alignments described in Table 5. For each alignment, we randomly selected a large number of eight taxa alignments (4,000 for the Archaea and Chloroplasts alignments, and 10,000 for the Eukarya, Bacteria and Mitochondria alignments). Each eight taxa alignment was first shuffled and then split in two halves. The tree concordance test assesses the similarity between the two trees inferred for the two sections of the alignment. Three measures of tree similarity are displayed: a binary count of whether the trees are topologically identical or not (TP), the Symmetric Difference Distance (SDD) and the normalized Branch Scoring Distance (nBSD). Results for all SSU (LSU) tests have been summarized together. (A) A histogram of total alignments, as well as the number of TPs for dnaml (magenta) and dnamlε (cyan) as a function of the total fraction of gaps in the alignment. (B,C,D) Results for the fraction of TPs, the SDD, and the nBSD respectively. Overall tree concordance for SSU rRNA is 27.9% (10,589/38,000) for dnamlε, versus for 16.9% (6,418/38,000) dnaml. Overall tree concordance for LSU rRNA is 46.6% (13,048/28,000) for dnamlε , versus 35.7% (10,002/28,000) for dnaml. (E) shows a comparison of time performance. dnamlε shows on average a two to three fold time increase respect to dnaml for eight taxa alignments.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2527138&req=5

pcbi-1000172-g004: Comparison of dnaml versus dnamlε using the “tree concordance test” on ribosomal RNA alignments.Tree concordance test for SSU (left) and LSU (right) rRNA alignments displayed as a function of the total fraction of gaps present in the alignment. We used five SSU and four LSU alignments described in Table 5. For each alignment, we randomly selected a large number of eight taxa alignments (4,000 for the Archaea and Chloroplasts alignments, and 10,000 for the Eukarya, Bacteria and Mitochondria alignments). Each eight taxa alignment was first shuffled and then split in two halves. The tree concordance test assesses the similarity between the two trees inferred for the two sections of the alignment. Three measures of tree similarity are displayed: a binary count of whether the trees are topologically identical or not (TP), the Symmetric Difference Distance (SDD) and the normalized Branch Scoring Distance (nBSD). Results for all SSU (LSU) tests have been summarized together. (A) A histogram of total alignments, as well as the number of TPs for dnaml (magenta) and dnamlε (cyan) as a function of the total fraction of gaps in the alignment. (B,C,D) Results for the fraction of TPs, the SDD, and the nBSD respectively. Overall tree concordance for SSU rRNA is 27.9% (10,589/38,000) for dnamlε, versus for 16.9% (6,418/38,000) dnaml. Overall tree concordance for LSU rRNA is 46.6% (13,048/28,000) for dnamlε , versus 35.7% (10,002/28,000) for dnaml. (E) shows a comparison of time performance. dnamlε shows on average a two to three fold time increase respect to dnaml for eight taxa alignments.
Mentions: Results are summarized in Figure 4. Overall, dnamlε shows tree concordance of 27.9% for SSU and 46.6% for LSU, while dnaml shows tree concordance in 16.9% for SSU and 35.7% for LSU. The error estimate for all these results is about 0.5–0.6%, which indicates that the improvement obtained by dnamlε is significant. LSU alignments are longer than SSU (4205±1179 versus 1959±579), probably explaining the better performance. For alignments with few gaps, the two methods produce similar results. The improvement of dnamlε over dnaml increases with the frequency of gaps in the alignments.

Bottom Line: A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time.However, the most widely used phylogenetic models only account for residue substitution events.We apply this model to phylogenetic tree inference by extending the program dnaml in phylip.

View Article: PubMed Central - PubMed

Affiliation: Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America. rivase@janelia.hhmi.org

ABSTRACT
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.

Show MeSH
Related in: MedlinePlus