Genetic distance for a general non-stationary markov substitution process.
Bottom Line: Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption.The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths.The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.
Affiliation: John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2600, Australia; and.Show MeSH
Related in: MedlinePlus
Mentions: We expected that discrepancy between or and would increase with increasing departure from compositional homogeneity. We measured this departure using JSD, a distance measure between the nucleotide frequency distributions. For alignments defined as being consistent with the General model (i.e., G statistic ), we computed the genetic distance error as and . For each alignment we selected the pair of species with maximum JSD, and calculated the genetic distance error between those species. The results are plotted in Figure 4 as a scatter plot with quartile regression lines. In all cases, the genetic distance error is overwhelmingly positive and appears to increase linearly with JSD. The genetic distance error differs between GTR and GTR primarily in that the latter exhibits larger positive skew, with the conditional interquartile range being at least ∼2.1 times larger for GTR than GTR in all cases. Additionally, the median regression is steeper for GTR than for GTR in both cases. We summarize the slopes and intercepts of the median regressions across data sets and models in Table 2. The variation of slopes between data sets is not surprising. Only the third codon position was sampled for the exonic data, in an effort to sample closer to a neutral evolutionary process (Table 2). All of the positions in the microbial data set were used, so some are likely to be affected by natural selection. The difference between the slopes may reflect these underlying differences in the generating processes.
Affiliation: John Curtin School of Medical Research, Australian National University, Canberra, ACT, 2600, Australia; and.