Limits...
Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models.

Duchêne S, Ho SY, Holmes EC - BMC Evol. Biol. (2015)

Bottom Line: In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution.In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames.Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated.

View Article: PubMed Central - PubMed

Affiliation: School of Biological Sciences, The University of Sydney, Sydney, NSW, 2006, Australia. sebastian.duchene@sydney.edu.au.

ABSTRACT

Background: Genetic analyses of DNA sequences make use of an increasingly complex set of nucleotide substitution models to estimate the divergence between gene sequences. However, there is currently no way to assess the validity of nucleotide substitution models over short time-scales and with limited mutational accumulation.

Results: We show that quantifying the decline in the ratio of transitions to transversions (ti/tv) over time provides an in-built measure of mutational saturation and hence of substitution model accuracy. We tested this through detailed phylogenetic analyses of 10 representative virus data sets comprising recently sampled and closely related sequences. In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution. This indicates that high levels of saturation are attained extremely rapidly in viruses, sometimes within decades. In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames. To validate the temporal trend of ti/tv across a broader taxonomic range, we analyzed a set of 76 different viruses. Again, the estimate of ti/tv scaled negatively with evolutionary time, a trend that was more pronounced for rapidly-evolving RNA viruses than slowly-evolving DNA viruses.

Conclusions: Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated. In turn, estimates of ti/tv provide an effective internal control of substitution model performance in viruses because of their high sensitivity to mutational saturation.

Show MeSH

Related in: MedlinePlus

Estimates ofti/tvand of the shape parameter,α, of the Γ-distribution of among-site rate variation plotted against total tree length (subs/site) for simulated data. Each line corresponds to a phylogenetic tree and the points are the maximum likelihood estimates for data simulated on trees with different lengths (x-axis). For clarity, the points have been jittered along the x-axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4358783&req=5

Fig2: Estimates ofti/tvand of the shape parameter,α, of the Γ-distribution of among-site rate variation plotted against total tree length (subs/site) for simulated data. Each line corresponds to a phylogenetic tree and the points are the maximum likelihood estimates for data simulated on trees with different lengths (x-axis). For clarity, the points have been jittered along the x-axis.

Mentions: Our simulations of the behavior of α (a measure of the extent of among-site rate variation) illustrate the expectation of this parameter under different levels of saturation and its relationship with ti/tv. We found that increasing the tree length led to a decrease in the estimate of ti/tv and an increase in that of α (Figure 2). We use tree length as a proxy for divergence time, such that long trees represent deep evolutionary time-scales. The trend in the estimates of these parameters is not necessarily linear. In particular, the estimate of ti/tv appeared to decrease slowly over time for our simulations based on tree lengths of 10 or less. In a few simulations, the estimate of this parameter increased slightly before sharply decreasing. Similarly, the estimate of α appeared to increase slowly for data sets simulated on trees of length 5 or less. We also observed a large variation in the estimate of α for deep evolutionary time-scales. Although the relationship between these parameters is complex, these simulations validate our general prediction that failure to account for mutational saturation can lead to an increase in the estimate of α over time and a decline in that of ti/tv. These results also show that these parameters can be estimated accurately only for very recent evolutionary time-scales.Figure 2


Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models.

Duchêne S, Ho SY, Holmes EC - BMC Evol. Biol. (2015)

Estimates ofti/tvand of the shape parameter,α, of the Γ-distribution of among-site rate variation plotted against total tree length (subs/site) for simulated data. Each line corresponds to a phylogenetic tree and the points are the maximum likelihood estimates for data simulated on trees with different lengths (x-axis). For clarity, the points have been jittered along the x-axis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4358783&req=5

Fig2: Estimates ofti/tvand of the shape parameter,α, of the Γ-distribution of among-site rate variation plotted against total tree length (subs/site) for simulated data. Each line corresponds to a phylogenetic tree and the points are the maximum likelihood estimates for data simulated on trees with different lengths (x-axis). For clarity, the points have been jittered along the x-axis.
Mentions: Our simulations of the behavior of α (a measure of the extent of among-site rate variation) illustrate the expectation of this parameter under different levels of saturation and its relationship with ti/tv. We found that increasing the tree length led to a decrease in the estimate of ti/tv and an increase in that of α (Figure 2). We use tree length as a proxy for divergence time, such that long trees represent deep evolutionary time-scales. The trend in the estimates of these parameters is not necessarily linear. In particular, the estimate of ti/tv appeared to decrease slowly over time for our simulations based on tree lengths of 10 or less. In a few simulations, the estimate of this parameter increased slightly before sharply decreasing. Similarly, the estimate of α appeared to increase slowly for data sets simulated on trees of length 5 or less. We also observed a large variation in the estimate of α for deep evolutionary time-scales. Although the relationship between these parameters is complex, these simulations validate our general prediction that failure to account for mutational saturation can lead to an increase in the estimate of α over time and a decline in that of ti/tv. These results also show that these parameters can be estimated accurately only for very recent evolutionary time-scales.Figure 2

Bottom Line: In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution.In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames.Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated.

View Article: PubMed Central - PubMed

Affiliation: School of Biological Sciences, The University of Sydney, Sydney, NSW, 2006, Australia. sebastian.duchene@sydney.edu.au.

ABSTRACT

Background: Genetic analyses of DNA sequences make use of an increasingly complex set of nucleotide substitution models to estimate the divergence between gene sequences. However, there is currently no way to assess the validity of nucleotide substitution models over short time-scales and with limited mutational accumulation.

Results: We show that quantifying the decline in the ratio of transitions to transversions (ti/tv) over time provides an in-built measure of mutational saturation and hence of substitution model accuracy. We tested this through detailed phylogenetic analyses of 10 representative virus data sets comprising recently sampled and closely related sequences. In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution. This indicates that high levels of saturation are attained extremely rapidly in viruses, sometimes within decades. In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames. To validate the temporal trend of ti/tv across a broader taxonomic range, we analyzed a set of 76 different viruses. Again, the estimate of ti/tv scaled negatively with evolutionary time, a trend that was more pronounced for rapidly-evolving RNA viruses than slowly-evolving DNA viruses.

Conclusions: Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated. In turn, estimates of ti/tv provide an effective internal control of substitution model performance in viruses because of their high sensitivity to mutational saturation.

Show MeSH
Related in: MedlinePlus