A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory.
Bottom Line: The sampler can overcome situations where no maximum-likelihood estimator exists, and it can adaptively learn the posterior distribution of highly correlated fitness landscapes without prior knowledge of their shape.We tested our approach on simulated data and applied it to clinical human immunodeficiency virus 1 samples to estimate their fitness landscapes in vivo.The posterior fitness distributions allowed for differentiating viral haplotypes from each other, for determining neutral haplotype networks, in which no haplotype is more or less credibly fit than any other, and for detecting epistasis in fitness landscapes.
Affiliation: Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland Swiss Institute of Bioinformatics, Basel 4058, Switzerland.Show MeSH
Related in: MedlinePlus
Mentions: We generated the fitness landscape with parameters where η = 10−2 and v = 5 × 10−7. The latter parameter is one-fourth of the corresponding v in the previous LK simulations such that the average selective advantages produced in the fitness landscape are not much larger than the mutation rate at the lower bound of 10−6; otherwise the coupling between haplotypes is too weak and no diversity will be present at equilibrium. We have iterated over 100 log-uniformly spaced μ-values in the interval [10−6, 10−3]. For each value of μ, we calculated the equilibrium distribution given our fitness landscape and the mutation matrix (Equation 4). For each equilibrium distribution, we simulated a read coverage of N = 100,000 by drawing from a multinomial distribution with p = g(f) and then applying our sampler with fixed μ = 3 × 10−5. We sampled a total of Ntrials = 43.2 × 106 with 144 chains and a thinning interval of 100, giving us 432,000 samples after each run for every μ. We calculated the mean marginal fitnesses of the last 100,000 samples and determined the rank correlation coefficient τKendall with respect to the initially fixed true fitness landscape for the QuasiFit-based estimator and the naive count-based estimator. The actual mutation rates vs. τKendall for the back-inference and the naive estimator are shown in Figure 7.
Affiliation: Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland Swiss Institute of Bioinformatics, Basel 4058, Switzerland.