Limits...
Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models.

Duchêne S, Ho SY, Holmes EC - BMC Evol. Biol. (2015)

Bottom Line: In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution.In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames.Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated.

View Article: PubMed Central - PubMed

Affiliation: School of Biological Sciences, The University of Sydney, Sydney, NSW, 2006, Australia. sebastian.duchene@sydney.edu.au.

ABSTRACT

Background: Genetic analyses of DNA sequences make use of an increasingly complex set of nucleotide substitution models to estimate the divergence between gene sequences. However, there is currently no way to assess the validity of nucleotide substitution models over short time-scales and with limited mutational accumulation.

Results: We show that quantifying the decline in the ratio of transitions to transversions (ti/tv) over time provides an in-built measure of mutational saturation and hence of substitution model accuracy. We tested this through detailed phylogenetic analyses of 10 representative virus data sets comprising recently sampled and closely related sequences. In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution. This indicates that high levels of saturation are attained extremely rapidly in viruses, sometimes within decades. In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames. To validate the temporal trend of ti/tv across a broader taxonomic range, we analyzed a set of 76 different viruses. Again, the estimate of ti/tv scaled negatively with evolutionary time, a trend that was more pronounced for rapidly-evolving RNA viruses than slowly-evolving DNA viruses.

Conclusions: Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated. In turn, estimates of ti/tv provide an effective internal control of substitution model performance in viruses because of their high sensitivity to mutational saturation.

Show MeSH

Related in: MedlinePlus

Estimates of key parameters plotted against root-node age for 10 representative viruses. (A)ti/tv, (B)dN/dS, (C) shape parameter α of the Γ-distribution, and (D) CG-content. The symbols represent the different viruses: African swine fever virus (ASFV), Barley yellow dwarf virus (BYDV), GPCR gene of Capripoxivirus (CaPV), CP gene of Cereal yellow dwarf virus (CYDV), Dengue virus type 4 (DENV-4), Ebolavirus (EBOV), Hepatitis B virus (HBV), HIV-1, Rabies virus (RABV), and HIV-2 and some closely related SIV lineages (HIV-2 + SIV). Black symbols correspond to the complete data sets, while red symbols correspond to the reduced-age data sets in which we removed the most divergent lineages. Lines show the differences in the estimates between the complete and the reduced-age data sets, and do not represent regressions. Dashed lines correspond to estimates that are not considered to differ between the complete and reduced-age data sets, which is assessed by estimating the parameters with random subsamples of the data (see Additional file 2: Table S2).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4358783&req=5

Fig1: Estimates of key parameters plotted against root-node age for 10 representative viruses. (A)ti/tv, (B)dN/dS, (C) shape parameter α of the Γ-distribution, and (D) CG-content. The symbols represent the different viruses: African swine fever virus (ASFV), Barley yellow dwarf virus (BYDV), GPCR gene of Capripoxivirus (CaPV), CP gene of Cereal yellow dwarf virus (CYDV), Dengue virus type 4 (DENV-4), Ebolavirus (EBOV), Hepatitis B virus (HBV), HIV-1, Rabies virus (RABV), and HIV-2 and some closely related SIV lineages (HIV-2 + SIV). Black symbols correspond to the complete data sets, while red symbols correspond to the reduced-age data sets in which we removed the most divergent lineages. Lines show the differences in the estimates between the complete and the reduced-age data sets, and do not represent regressions. Dashed lines correspond to estimates that are not considered to differ between the complete and reduced-age data sets, which is assessed by estimating the parameters with random subsamples of the data (see Additional file 2: Table S2).

Mentions: The abbreviations correspond to those in Figure 1.


Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models.

Duchêne S, Ho SY, Holmes EC - BMC Evol. Biol. (2015)

Estimates of key parameters plotted against root-node age for 10 representative viruses. (A)ti/tv, (B)dN/dS, (C) shape parameter α of the Γ-distribution, and (D) CG-content. The symbols represent the different viruses: African swine fever virus (ASFV), Barley yellow dwarf virus (BYDV), GPCR gene of Capripoxivirus (CaPV), CP gene of Cereal yellow dwarf virus (CYDV), Dengue virus type 4 (DENV-4), Ebolavirus (EBOV), Hepatitis B virus (HBV), HIV-1, Rabies virus (RABV), and HIV-2 and some closely related SIV lineages (HIV-2 + SIV). Black symbols correspond to the complete data sets, while red symbols correspond to the reduced-age data sets in which we removed the most divergent lineages. Lines show the differences in the estimates between the complete and the reduced-age data sets, and do not represent regressions. Dashed lines correspond to estimates that are not considered to differ between the complete and reduced-age data sets, which is assessed by estimating the parameters with random subsamples of the data (see Additional file 2: Table S2).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4358783&req=5

Fig1: Estimates of key parameters plotted against root-node age for 10 representative viruses. (A)ti/tv, (B)dN/dS, (C) shape parameter α of the Γ-distribution, and (D) CG-content. The symbols represent the different viruses: African swine fever virus (ASFV), Barley yellow dwarf virus (BYDV), GPCR gene of Capripoxivirus (CaPV), CP gene of Cereal yellow dwarf virus (CYDV), Dengue virus type 4 (DENV-4), Ebolavirus (EBOV), Hepatitis B virus (HBV), HIV-1, Rabies virus (RABV), and HIV-2 and some closely related SIV lineages (HIV-2 + SIV). Black symbols correspond to the complete data sets, while red symbols correspond to the reduced-age data sets in which we removed the most divergent lineages. Lines show the differences in the estimates between the complete and the reduced-age data sets, and do not represent regressions. Dashed lines correspond to estimates that are not considered to differ between the complete and reduced-age data sets, which is assessed by estimating the parameters with random subsamples of the data (see Additional file 2: Table S2).
Mentions: The abbreviations correspond to those in Figure 1.

Bottom Line: In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution.In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames.Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated.

View Article: PubMed Central - PubMed

Affiliation: School of Biological Sciences, The University of Sydney, Sydney, NSW, 2006, Australia. sebastian.duchene@sydney.edu.au.

ABSTRACT

Background: Genetic analyses of DNA sequences make use of an increasingly complex set of nucleotide substitution models to estimate the divergence between gene sequences. However, there is currently no way to assess the validity of nucleotide substitution models over short time-scales and with limited mutational accumulation.

Results: We show that quantifying the decline in the ratio of transitions to transversions (ti/tv) over time provides an in-built measure of mutational saturation and hence of substitution model accuracy. We tested this through detailed phylogenetic analyses of 10 representative virus data sets comprising recently sampled and closely related sequences. In the majority of cases our estimates of ti/tv decrease with time, even under sophisticated time-reversible models of nucleotide substitution. This indicates that high levels of saturation are attained extremely rapidly in viruses, sometimes within decades. In contrast, we did not find any temporal patterns in selection pressures or CG-content over these short time-frames. To validate the temporal trend of ti/tv across a broader taxonomic range, we analyzed a set of 76 different viruses. Again, the estimate of ti/tv scaled negatively with evolutionary time, a trend that was more pronounced for rapidly-evolving RNA viruses than slowly-evolving DNA viruses.

Conclusions: Our study shows that commonly used substitution models can underestimate the number of substitutions among closely related sequences, such that the time-scale of viral evolution and emergence may be systematically underestimated. In turn, estimates of ti/tv provide an effective internal control of substitution model performance in viruses because of their high sensitivity to mutational saturation.

Show MeSH
Related in: MedlinePlus