Limits...
Accurate detection of recombinant breakpoints in whole-genome alignments.

Westesson O, Holmes I - PLoS Comput. Biol. (2009)

Bottom Line: Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment.We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy.In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America.

ABSTRACT
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

Show MeSH

Related in: MedlinePlus

The top figure shows our analysis of the strain CRF01_AE/B Malaysian HIV-1 with our recombination phylo-HMM.We recover 6 previously predicted recombination breakpoints (red), and predict new regions in 6415–6594 and 2360–2553 (green). The grey and black regions correspond to posterior probabilities of the trees shown in the lowest figure. Previous bootscanning analysis of the same data is shown in the middle figure [18]. Since this previous analysis involved removing gaps from the alignment, we provide approximate mappings from our predictions to theirs, as the red dashed lines between the two figures. They provided precise breakpoint locations in [18] based on consensus HXB2 strain, which we plot in our figure as the vertical red lines. Note the spike in their plot that appears in our plot around 6500 as a recombinant region. The trees in the lowest figure were those trained as hidden states in our HMM; the black state clearly shows the query strain clustering with CRF_AE, whereas the gray tree shows a closer relationship with subtype B, in accordance with the previous findings.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2651022&req=5

pcbi-1000318-g004: The top figure shows our analysis of the strain CRF01_AE/B Malaysian HIV-1 with our recombination phylo-HMM.We recover 6 previously predicted recombination breakpoints (red), and predict new regions in 6415–6594 and 2360–2553 (green). The grey and black regions correspond to posterior probabilities of the trees shown in the lowest figure. Previous bootscanning analysis of the same data is shown in the middle figure [18]. Since this previous analysis involved removing gaps from the alignment, we provide approximate mappings from our predictions to theirs, as the red dashed lines between the two figures. They provided precise breakpoint locations in [18] based on consensus HXB2 strain, which we plot in our figure as the vertical red lines. Note the spike in their plot that appears in our plot around 6500 as a recombinant region. The trees in the lowest figure were those trained as hidden states in our HMM; the black state clearly shows the query strain clustering with CRF_AE, whereas the gray tree shows a closer relationship with subtype B, in accordance with the previous findings.

Mentions: Figure 4 depicts our results on a new Malaysian HIV strain previously analyzed by Lau et al. [18]. We recover all six of the breakpoints inferred by the original authors, who used a SimPlot/Bootscanning approach, and also we find two new breakpoints whose significance appears equal to those found previously. In Figure 4, we show for comparison the results from bootscanning, which Lau et al. used for their inference of recombination breakpoints. Lau et al. provided precise breakpoint positions, and these are plotted in our diagram as red dashed lines. Since bootscanning typically removes gaps from multiple alignments before analysis, the breakpoint positions do not align with Lau et al.'s plot very well, and we provide rough mapping between plots. All six of their breakpoint predictions are well-represented in our analysis. Note the ‘spike’ in likelihood at around nt 5800 in Lau et al's plot. This region registered as strongly recombinant in our analysis, depicted as the grey region in region nt 6415–6594. Lau et al.'s characterization of the 1500 to 2000 region ( 2141 to 2856 in ours ) is marked somewhat by uncertainty in the optimal tree topology; their “% trees” line wavers and is never very close to 100%, in contrast to their inference of region 3000 to 5500, where the line remains constant and close to 100%. This uncertainty suggests that there may be additional recombination points within that region, as is more conclusively shown in our diagram. We venture that the region between nt 2141 and 2856 can be further partitioned by two more breakpoints, at nt 2360 and 2553, shown in Figure 4.


Accurate detection of recombinant breakpoints in whole-genome alignments.

Westesson O, Holmes I - PLoS Comput. Biol. (2009)

The top figure shows our analysis of the strain CRF01_AE/B Malaysian HIV-1 with our recombination phylo-HMM.We recover 6 previously predicted recombination breakpoints (red), and predict new regions in 6415–6594 and 2360–2553 (green). The grey and black regions correspond to posterior probabilities of the trees shown in the lowest figure. Previous bootscanning analysis of the same data is shown in the middle figure [18]. Since this previous analysis involved removing gaps from the alignment, we provide approximate mappings from our predictions to theirs, as the red dashed lines between the two figures. They provided precise breakpoint locations in [18] based on consensus HXB2 strain, which we plot in our figure as the vertical red lines. Note the spike in their plot that appears in our plot around 6500 as a recombinant region. The trees in the lowest figure were those trained as hidden states in our HMM; the black state clearly shows the query strain clustering with CRF_AE, whereas the gray tree shows a closer relationship with subtype B, in accordance with the previous findings.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2651022&req=5

pcbi-1000318-g004: The top figure shows our analysis of the strain CRF01_AE/B Malaysian HIV-1 with our recombination phylo-HMM.We recover 6 previously predicted recombination breakpoints (red), and predict new regions in 6415–6594 and 2360–2553 (green). The grey and black regions correspond to posterior probabilities of the trees shown in the lowest figure. Previous bootscanning analysis of the same data is shown in the middle figure [18]. Since this previous analysis involved removing gaps from the alignment, we provide approximate mappings from our predictions to theirs, as the red dashed lines between the two figures. They provided precise breakpoint locations in [18] based on consensus HXB2 strain, which we plot in our figure as the vertical red lines. Note the spike in their plot that appears in our plot around 6500 as a recombinant region. The trees in the lowest figure were those trained as hidden states in our HMM; the black state clearly shows the query strain clustering with CRF_AE, whereas the gray tree shows a closer relationship with subtype B, in accordance with the previous findings.
Mentions: Figure 4 depicts our results on a new Malaysian HIV strain previously analyzed by Lau et al. [18]. We recover all six of the breakpoints inferred by the original authors, who used a SimPlot/Bootscanning approach, and also we find two new breakpoints whose significance appears equal to those found previously. In Figure 4, we show for comparison the results from bootscanning, which Lau et al. used for their inference of recombination breakpoints. Lau et al. provided precise breakpoint positions, and these are plotted in our diagram as red dashed lines. Since bootscanning typically removes gaps from multiple alignments before analysis, the breakpoint positions do not align with Lau et al.'s plot very well, and we provide rough mapping between plots. All six of their breakpoint predictions are well-represented in our analysis. Note the ‘spike’ in likelihood at around nt 5800 in Lau et al's plot. This region registered as strongly recombinant in our analysis, depicted as the grey region in region nt 6415–6594. Lau et al.'s characterization of the 1500 to 2000 region ( 2141 to 2856 in ours ) is marked somewhat by uncertainty in the optimal tree topology; their “% trees” line wavers and is never very close to 100%, in contrast to their inference of region 3000 to 5500, where the line remains constant and close to 100%. This uncertainty suggests that there may be additional recombination points within that region, as is more conclusively shown in our diagram. We venture that the region between nt 2141 and 2856 can be further partitioned by two more breakpoints, at nt 2360 and 2553, shown in Figure 4.

Bottom Line: Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment.We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy.In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America.

ABSTRACT
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.

Show MeSH
Related in: MedlinePlus