Limits...
Accurate detection of recombinant breakpoints in whole-genome alignments.

Westesson O, Holmes I - PLoS Comput. Biol. (2009)

Bottom Line: Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment.We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy.In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America.

ABSTRACT
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

Show MeSH

Related in: MedlinePlus

Brazilian strain BREPM11871.Confirmation of breakpoints 1322 and 2571, and 4782 (red dashed lines). We predict a region common to BREPM16704 at 9238–9361 (green). Also, the breakpoint previously estimated at 5462 (red) we propose to be at 5277 (green dashed line). In support of this, we provide bootstrapping values (1000 replicates) for the 3 different regions, indicated by horizontal colored lines above the plot. Our prediction (orange) carries the highest value, 99.9%, whereas the previous (blue) is only 85.1%, since it includes a region (purple) that strongly supports BREPM11871 clustering with subtype B, with value 98.2%. The small region at 985–1080 is difficult to confidently categorize, but its high posterior probability for clustering with F and its agreement with the other two strains lead us to suspect a recombination. Trees trained in hidden states are shown below the plot.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2651022&req=5

pcbi-1000318-g008: Brazilian strain BREPM11871.Confirmation of breakpoints 1322 and 2571, and 4782 (red dashed lines). We predict a region common to BREPM16704 at 9238–9361 (green). Also, the breakpoint previously estimated at 5462 (red) we propose to be at 5277 (green dashed line). In support of this, we provide bootstrapping values (1000 replicates) for the 3 different regions, indicated by horizontal colored lines above the plot. Our prediction (orange) carries the highest value, 99.9%, whereas the previous (blue) is only 85.1%, since it includes a region (purple) that strongly supports BREPM11871 clustering with subtype B, with value 98.2%. The small region at 985–1080 is difficult to confidently categorize, but its high posterior probability for clustering with F and its agreement with the other two strains lead us to suspect a recombination. Trees trained in hidden states are shown below the plot.

Mentions: In strain BREPM11871, all four breakpoints predicted by Filho et al. were found, as well as a new crossover region, common to BREPM 16704, at 9238–9361 (shown in green in Figure 8). The break previously described at nt 5462 bp was predicted by our method to be at 5277. To determine the more likely crossover point, we performed 1000 bootstrapping trials on each of the following regions: 4782–5277 (our prediction), 4782–5462 (Filho et al.'s prediction), and 5277–5462 (the disputed region). We found that the 5277–5462 region strongly supported BREPM11871 clustering with subtype B, with 98.2% bootstrap support. Moreover, bootstrap support for query-F clustering appears higher for our predicted region (99.9%) than the previous prediction (85.1%). We conclude that our algorithm often outperforms previous methods in accurately determining recombination breakpoints.


Accurate detection of recombinant breakpoints in whole-genome alignments.

Westesson O, Holmes I - PLoS Comput. Biol. (2009)

Brazilian strain BREPM11871.Confirmation of breakpoints 1322 and 2571, and 4782 (red dashed lines). We predict a region common to BREPM16704 at 9238–9361 (green). Also, the breakpoint previously estimated at 5462 (red) we propose to be at 5277 (green dashed line). In support of this, we provide bootstrapping values (1000 replicates) for the 3 different regions, indicated by horizontal colored lines above the plot. Our prediction (orange) carries the highest value, 99.9%, whereas the previous (blue) is only 85.1%, since it includes a region (purple) that strongly supports BREPM11871 clustering with subtype B, with value 98.2%. The small region at 985–1080 is difficult to confidently categorize, but its high posterior probability for clustering with F and its agreement with the other two strains lead us to suspect a recombination. Trees trained in hidden states are shown below the plot.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2651022&req=5

pcbi-1000318-g008: Brazilian strain BREPM11871.Confirmation of breakpoints 1322 and 2571, and 4782 (red dashed lines). We predict a region common to BREPM16704 at 9238–9361 (green). Also, the breakpoint previously estimated at 5462 (red) we propose to be at 5277 (green dashed line). In support of this, we provide bootstrapping values (1000 replicates) for the 3 different regions, indicated by horizontal colored lines above the plot. Our prediction (orange) carries the highest value, 99.9%, whereas the previous (blue) is only 85.1%, since it includes a region (purple) that strongly supports BREPM11871 clustering with subtype B, with value 98.2%. The small region at 985–1080 is difficult to confidently categorize, but its high posterior probability for clustering with F and its agreement with the other two strains lead us to suspect a recombination. Trees trained in hidden states are shown below the plot.
Mentions: In strain BREPM11871, all four breakpoints predicted by Filho et al. were found, as well as a new crossover region, common to BREPM 16704, at 9238–9361 (shown in green in Figure 8). The break previously described at nt 5462 bp was predicted by our method to be at 5277. To determine the more likely crossover point, we performed 1000 bootstrapping trials on each of the following regions: 4782–5277 (our prediction), 4782–5462 (Filho et al.'s prediction), and 5277–5462 (the disputed region). We found that the 5277–5462 region strongly supported BREPM11871 clustering with subtype B, with 98.2% bootstrap support. Moreover, bootstrap support for query-F clustering appears higher for our predicted region (99.9%) than the previous prediction (85.1%). We conclude that our algorithm often outperforms previous methods in accurately determining recombination breakpoints.

Bottom Line: Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment.We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy.In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America.

ABSTRACT
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

Show MeSH
Related in: MedlinePlus