Limits...
The evolutionary rates of HCV estimated with subtype 1a and 1b sequences over the ORF length and in different genomic regions.

Yuan M, Lu T, Li C, Lu L - PLoS ONE (2013)

Bottom Line: Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model.Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length.By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Pathology and Laboratory Medicine, Center for Viral Oncology, University of Kansas Medical Center, Kansas City, Kansas, United States of America.

ABSTRACT

Background: Considerable progress has been made in the HCV evolutionary analysis, since the software BEAST was released. However, prior information, especially the prior evolutionary rate, which plays a critical role in BEAST analysis, is always difficult to ascertain due to various uncertainties. Providing a proper prior HCV evolutionary rate is thus of great importance.

Methods/results: 176 full-length sequences of HCV subtype 1a and 144 of 1b were assembled by taking into consideration the balance of the sampling dates and the even dispersion in phylogenetic trees. According to the HCV genomic organization and biological functions, each dataset was partitioned into nine genomic regions and two routinely amplified regions. A uniform prior rate was applied to the BEAST analysis for each region and also the entire ORF. All the obtained posterior rates for 1a are of a magnitude of 10(-3) substitutions/site/year and in a bell-shaped distribution. Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model. This indicates that some of the rates for subtype 1b are less accurate, so they were adjusted by including more sequences to improve the temporal structure.

Conclusion: Among the various HCV subtypes and genomic regions, the evolutionary patterns are dissimilar. Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length. By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

Show MeSH
The median evolutionary rates and the tMRCAs estimated in the nine genomic regions and over the entire ORF of the subtype 1a and 1b datasets.Panels A, B, and C show the median evolutionary rates. Panels D, E, and F show the median tMRCAs. The blue columns represent the estimates for 1a. The red columns represent the estimates for 1b. The dash lines indicate the estimates for the entire ORF.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3675120&req=5

pone-0064698-g003: The median evolutionary rates and the tMRCAs estimated in the nine genomic regions and over the entire ORF of the subtype 1a and 1b datasets.Panels A, B, and C show the median evolutionary rates. Panels D, E, and F show the median tMRCAs. The blue columns represent the estimates for 1a. The red columns represent the estimates for 1b. The dash lines indicate the estimates for the entire ORF.

Mentions: To demonstrate the influence of the uniform rate prior on the posterior rates estimated under the three models for the different datasets, we plotted the marginal posterior rate densities as violin plots (Figure 2). A violin plot is a combination of a box plot and a rotated kernel density curve to display the probability density of a given parameter [25], [27]. In this study, except for the ORF under the strict model, all of the 1a marginal posterior rate density curves are bell shaped. Combined with the information in Table S1, the three models estimated very close median rates for a given dataset. However, the differences in their 95% confidence intervals are quite large, the largest being that under the exponential model and the smallest under the strict. The Core region exhibited the lowest median rates (9.04×10−4, 8.43×10−4, and 7.78×10−4 under the exponential, lognormal, and strict models, respectively), while the P7 the largest (2.15×10−3, 2.04×10−3 and 1.94×10−3), which means that the P7 region has evolved more than two times faster than the Core. The rank of the median rates, P7>E2>E1>NS2>NS4>NS5A>NS3>NS5B>Core, was consistently obtained with the three models. In addition, the ORF exhibited consistent median rates (1.56×10−3 in exponential, 1.53×10−3 in lognormal and 1.55×10−3 in strict) under the three models, which are the most close to those given under the same model for the E1 region (1.45×10−3 in exponential, 1.47×10−3 in lognormal and 1.43×10−3 in strict). However, the rate heterogeneity among sites (α) is variable among datasets. A high α value suggests a weak mutational “hot spot” [28]. Compared with the values provided by the root-to-tip regression, higher and more accurate rates were estimated using the MCMC procedure. In contrast to the median rates, the median tMRCAs were found to be largely similar across different genomic regions, particularly under the strict model (Figure 3). Theoretically, the rates are variable, but the ages estimated by the tMRCAs are identical. This is because even though they estimate the ancestor based on different genomic regions, they refer to the same ancestor. Therefore, the degree to which tMRCAs differ among the regions can be used to evaluate the robustness of the MCMC process. Bayes Factor comparison showed that the exponential model outperformed the other two models in all of the nine genomic regions. For the ORF dataset, however, the lognormal is the best model (Table S1).


The evolutionary rates of HCV estimated with subtype 1a and 1b sequences over the ORF length and in different genomic regions.

Yuan M, Lu T, Li C, Lu L - PLoS ONE (2013)

The median evolutionary rates and the tMRCAs estimated in the nine genomic regions and over the entire ORF of the subtype 1a and 1b datasets.Panels A, B, and C show the median evolutionary rates. Panels D, E, and F show the median tMRCAs. The blue columns represent the estimates for 1a. The red columns represent the estimates for 1b. The dash lines indicate the estimates for the entire ORF.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3675120&req=5

pone-0064698-g003: The median evolutionary rates and the tMRCAs estimated in the nine genomic regions and over the entire ORF of the subtype 1a and 1b datasets.Panels A, B, and C show the median evolutionary rates. Panels D, E, and F show the median tMRCAs. The blue columns represent the estimates for 1a. The red columns represent the estimates for 1b. The dash lines indicate the estimates for the entire ORF.
Mentions: To demonstrate the influence of the uniform rate prior on the posterior rates estimated under the three models for the different datasets, we plotted the marginal posterior rate densities as violin plots (Figure 2). A violin plot is a combination of a box plot and a rotated kernel density curve to display the probability density of a given parameter [25], [27]. In this study, except for the ORF under the strict model, all of the 1a marginal posterior rate density curves are bell shaped. Combined with the information in Table S1, the three models estimated very close median rates for a given dataset. However, the differences in their 95% confidence intervals are quite large, the largest being that under the exponential model and the smallest under the strict. The Core region exhibited the lowest median rates (9.04×10−4, 8.43×10−4, and 7.78×10−4 under the exponential, lognormal, and strict models, respectively), while the P7 the largest (2.15×10−3, 2.04×10−3 and 1.94×10−3), which means that the P7 region has evolved more than two times faster than the Core. The rank of the median rates, P7>E2>E1>NS2>NS4>NS5A>NS3>NS5B>Core, was consistently obtained with the three models. In addition, the ORF exhibited consistent median rates (1.56×10−3 in exponential, 1.53×10−3 in lognormal and 1.55×10−3 in strict) under the three models, which are the most close to those given under the same model for the E1 region (1.45×10−3 in exponential, 1.47×10−3 in lognormal and 1.43×10−3 in strict). However, the rate heterogeneity among sites (α) is variable among datasets. A high α value suggests a weak mutational “hot spot” [28]. Compared with the values provided by the root-to-tip regression, higher and more accurate rates were estimated using the MCMC procedure. In contrast to the median rates, the median tMRCAs were found to be largely similar across different genomic regions, particularly under the strict model (Figure 3). Theoretically, the rates are variable, but the ages estimated by the tMRCAs are identical. This is because even though they estimate the ancestor based on different genomic regions, they refer to the same ancestor. Therefore, the degree to which tMRCAs differ among the regions can be used to evaluate the robustness of the MCMC process. Bayes Factor comparison showed that the exponential model outperformed the other two models in all of the nine genomic regions. For the ORF dataset, however, the lognormal is the best model (Table S1).

Bottom Line: Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model.Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length.By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

View Article: PubMed Central - PubMed

Affiliation: Department of Pathology and Laboratory Medicine, Center for Viral Oncology, University of Kansas Medical Center, Kansas City, Kansas, United States of America.

ABSTRACT

Background: Considerable progress has been made in the HCV evolutionary analysis, since the software BEAST was released. However, prior information, especially the prior evolutionary rate, which plays a critical role in BEAST analysis, is always difficult to ascertain due to various uncertainties. Providing a proper prior HCV evolutionary rate is thus of great importance.

Methods/results: 176 full-length sequences of HCV subtype 1a and 144 of 1b were assembled by taking into consideration the balance of the sampling dates and the even dispersion in phylogenetic trees. According to the HCV genomic organization and biological functions, each dataset was partitioned into nine genomic regions and two routinely amplified regions. A uniform prior rate was applied to the BEAST analysis for each region and also the entire ORF. All the obtained posterior rates for 1a are of a magnitude of 10(-3) substitutions/site/year and in a bell-shaped distribution. Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model. This indicates that some of the rates for subtype 1b are less accurate, so they were adjusted by including more sequences to improve the temporal structure.

Conclusion: Among the various HCV subtypes and genomic regions, the evolutionary patterns are dissimilar. Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length. By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

Show MeSH