Limits...
Genetic distance for a general non-stationary markov substitution process.

Kaehler BD, Yap VB, Zhang R, Huttley GA - Syst. Biol. (2014)

Bottom Line: Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption.The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths.The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.

View Article: PubMed Central - PubMed

Affiliation: John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2600, Australia; and.

Show MeSH

Related in: MedlinePlus

The genetic distance error increases with JSD. Genetic distance is the expected number of substitutions, denoted , , and  as estimated using the General, GTR and GTR models, respectively. Scatter plots show an empirical relationship between JSD and  or . In every case the GTR and GTR models tend overwhelmingly toward overestimation. Solid lines show quantile regressions for 25%, 50%, and 75% quantiles. All General model fits have goodness-of-fit (G statistic) -value . a) 3906 alignments of human, mouse, and opossum protein coding genes. b) 4557 alignments of triads of 16S ribosomal RNA.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4380038&req=5

Figure 4: The genetic distance error increases with JSD. Genetic distance is the expected number of substitutions, denoted , , and as estimated using the General, GTR and GTR models, respectively. Scatter plots show an empirical relationship between JSD and or . In every case the GTR and GTR models tend overwhelmingly toward overestimation. Solid lines show quantile regressions for 25%, 50%, and 75% quantiles. All General model fits have goodness-of-fit (G statistic) -value . a) 3906 alignments of human, mouse, and opossum protein coding genes. b) 4557 alignments of triads of 16S ribosomal RNA.

Mentions: We expected that discrepancy between or and would increase with increasing departure from compositional homogeneity. We measured this departure using JSD, a distance measure between the nucleotide frequency distributions. For alignments defined as being consistent with the General model (i.e., G statistic ), we computed the genetic distance error as and . For each alignment we selected the pair of species with maximum JSD, and calculated the genetic distance error between those species. The results are plotted in Figure 4 as a scatter plot with quartile regression lines. In all cases, the genetic distance error is overwhelmingly positive and appears to increase linearly with JSD. The genetic distance error differs between GTR and GTR primarily in that the latter exhibits larger positive skew, with the conditional interquartile range being at least ∼2.1 times larger for GTR than GTR in all cases. Additionally, the median regression is steeper for GTR than for GTR in both cases. We summarize the slopes and intercepts of the median regressions across data sets and models in Table 2. The variation of slopes between data sets is not surprising. Only the third codon position was sampled for the exonic data, in an effort to sample closer to a neutral evolutionary process (Table 2). All of the positions in the microbial data set were used, so some are likely to be affected by natural selection. The difference between the slopes may reflect these underlying differences in the generating processes.


Genetic distance for a general non-stationary markov substitution process.

Kaehler BD, Yap VB, Zhang R, Huttley GA - Syst. Biol. (2014)

The genetic distance error increases with JSD. Genetic distance is the expected number of substitutions, denoted , , and  as estimated using the General, GTR and GTR models, respectively. Scatter plots show an empirical relationship between JSD and  or . In every case the GTR and GTR models tend overwhelmingly toward overestimation. Solid lines show quantile regressions for 25%, 50%, and 75% quantiles. All General model fits have goodness-of-fit (G statistic) -value . a) 3906 alignments of human, mouse, and opossum protein coding genes. b) 4557 alignments of triads of 16S ribosomal RNA.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4380038&req=5

Figure 4: The genetic distance error increases with JSD. Genetic distance is the expected number of substitutions, denoted , , and as estimated using the General, GTR and GTR models, respectively. Scatter plots show an empirical relationship between JSD and or . In every case the GTR and GTR models tend overwhelmingly toward overestimation. Solid lines show quantile regressions for 25%, 50%, and 75% quantiles. All General model fits have goodness-of-fit (G statistic) -value . a) 3906 alignments of human, mouse, and opossum protein coding genes. b) 4557 alignments of triads of 16S ribosomal RNA.
Mentions: We expected that discrepancy between or and would increase with increasing departure from compositional homogeneity. We measured this departure using JSD, a distance measure between the nucleotide frequency distributions. For alignments defined as being consistent with the General model (i.e., G statistic ), we computed the genetic distance error as and . For each alignment we selected the pair of species with maximum JSD, and calculated the genetic distance error between those species. The results are plotted in Figure 4 as a scatter plot with quartile regression lines. In all cases, the genetic distance error is overwhelmingly positive and appears to increase linearly with JSD. The genetic distance error differs between GTR and GTR primarily in that the latter exhibits larger positive skew, with the conditional interquartile range being at least ∼2.1 times larger for GTR than GTR in all cases. Additionally, the median regression is steeper for GTR than for GTR in both cases. We summarize the slopes and intercepts of the median regressions across data sets and models in Table 2. The variation of slopes between data sets is not surprising. Only the third codon position was sampled for the exonic data, in an effort to sample closer to a neutral evolutionary process (Table 2). All of the positions in the microbial data set were used, so some are likely to be affected by natural selection. The difference between the slopes may reflect these underlying differences in the generating processes.

Bottom Line: Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption.The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths.The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.

View Article: PubMed Central - PubMed

Affiliation: John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2600, Australia; and.

Show MeSH
Related in: MedlinePlus