Limits...
Accurate detection of recombinant breakpoints in whole-genome alignments.

Westesson O, Holmes I - PLoS Comput. Biol. (2009)

Bottom Line: Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment.We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy.In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America.

ABSTRACT
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

Show MeSH
Phylo-HMM training algorithm.Input Alignment ⇒ Model Selection/Parameter Estimation ⇒ Recombination Inference.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2651022&req=5

pcbi-1000318-g010: Phylo-HMM training algorithm.Input Alignment ⇒ Model Selection/Parameter Estimation ⇒ Recombination Inference.

Mentions: We use a hidden Markov model with tree topologies as hidden states and alignment columns as observed states. The usual method to train HMM parameters is by the specialization of the EM algorithm known as the Baum-Welch algorithm. Our transitions can be optimized in the usual way, but the emissions are more difficult since their likelihood is governed by the tree topologies in the hidden states, which are not easily optimized. For this problem, we employ an EM method for trees, developed by Friedman et al. [21], within our M-step for emission probabilities. Their phylogenetic inference algorithm is implemented with such improvements as simulated annealing and noise injection in SEMPHY, available from their website at http://compbio.cs.huji.ac.il/semphy/. We instead implemented our own ‘bare-bones’ version of this algorithm, without these improvements. To progressively assign columns to trees, at each M-step, we weight the expected counts of the tree-EM by the posterior counts of the phylo-HMM. Intuitively, this guides the tree-maximization by providing comparatively more information from regions which fit a particular tree. We give a high-level description of our method here and more detail is provided in Text S1. Figure 10 gives a graphical representation of the overall task-flow. After running our program on the alignment data to estimate parameters, a state path through the HMM is computed from the final matrix of posterior probabilities. We would like this path to represent a balance between being biologically reasonable and highly probable under our model. Thus, we propose maximizing the sum of posterior state probabilities subject to a length cutoff by the following method.


Accurate detection of recombinant breakpoints in whole-genome alignments.

Westesson O, Holmes I - PLoS Comput. Biol. (2009)

Phylo-HMM training algorithm.Input Alignment ⇒ Model Selection/Parameter Estimation ⇒ Recombination Inference.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2651022&req=5

pcbi-1000318-g010: Phylo-HMM training algorithm.Input Alignment ⇒ Model Selection/Parameter Estimation ⇒ Recombination Inference.
Mentions: We use a hidden Markov model with tree topologies as hidden states and alignment columns as observed states. The usual method to train HMM parameters is by the specialization of the EM algorithm known as the Baum-Welch algorithm. Our transitions can be optimized in the usual way, but the emissions are more difficult since their likelihood is governed by the tree topologies in the hidden states, which are not easily optimized. For this problem, we employ an EM method for trees, developed by Friedman et al. [21], within our M-step for emission probabilities. Their phylogenetic inference algorithm is implemented with such improvements as simulated annealing and noise injection in SEMPHY, available from their website at http://compbio.cs.huji.ac.il/semphy/. We instead implemented our own ‘bare-bones’ version of this algorithm, without these improvements. To progressively assign columns to trees, at each M-step, we weight the expected counts of the tree-EM by the posterior counts of the phylo-HMM. Intuitively, this guides the tree-maximization by providing comparatively more information from regions which fit a particular tree. We give a high-level description of our method here and more detail is provided in Text S1. Figure 10 gives a graphical representation of the overall task-flow. After running our program on the alignment data to estimate parameters, a state path through the HMM is computed from the final matrix of posterior probabilities. We would like this path to represent a balance between being biologically reasonable and highly probable under our model. Thus, we propose maximizing the sum of posterior state probabilities subject to a length cutoff by the following method.

Bottom Line: Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment.We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy.In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America.

ABSTRACT
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

Show MeSH