Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.
Bottom Line:
We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.
View Article:
PubMed Central - PubMed
Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com
ABSTRACT
Show MeSH
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution. Related in: MedlinePlus |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC2664478&req=5
Mentions: With this mean-field approximation, the issue becomes determining the averagedistribution of stabilities in an evolving population of proteins. This problemhas been treated previously by simulations [29] and mathematicallythrough matrix [31] and diffusion [32] equationapproaches. The average distribution of stabilities turns out to depend on thedegree of polymorphism in the population, with highly polymorphic populations(those with the product of the population size and the per sequence per generation mutation rate much greater than one) evolving to greater average stabilitiesthan populations that are mostly monomorphic (those with ) [31],[67],[68].Here we will consider only the case where the population is mostly monomorphic,so that all proteins tend to have converged to the same stability before a newmutation occurs (as is the case for the proteins shown in Figure 1). This choice is dictated by thefact that we are unclear how to incorporate the secondary selection formutational robustness that occurs in highly polymorphic populations [31],[67],[68]. Weacknowledge that some of the proteins that we analyze later in this paper(particularly influenza hemagglutinin) may actually evolve in populations thatare highly polymorphic, and suggest that a mathematical treatment recognizingthis fact is an area for future research. Given our choice to consider only thecase where the population is mostly monomorphic, we will adopt the mathematicalformalism described in [31] for the limit when (the more compact diffusion-equation approach of Shakhnovichand coworkers [32] cannot be used since it only applies when ). Following [31], we discretize the continuous variable ofextra protein stability into small bins of width , and assign a protein to bin if it has extra stability such that , where . Here is some large integer giving an upper limit on the number ofstability bins (so that all proteins in the evolving population have ). Note that all folded proteins fall into one of these bins,since proteins with fail to fold under the stability threshold model. Reference[31] finds that the distribution of average proteinstabilities is well approximated by an exponential (see the middle panels ofFigure 2 of thisreference, or alternatively Figure2A of [29]), such that the probability that a protein in the evolving population has extra stabilitythat falls in bin is(3)where is a constant describing the steepness of the exponential.Figure 2 shows thisdistribution of protein stabilities graphically. Note that this exactmathematical form for is not proven in [31], but simply that allnumerical solutions give distributions for that resemble this form. Other mathematical forms could bechosen for without altering the mathematical analysis that follows,although they might affect the actual numerical values that are ultimatelyinferred for the values. In particular, in highly polymorphic populations, thedistribution of stabilities is peaked at a value slightly below the stabilitythreshold (see right panels of Figure 2 of [31], Figure 2 of [32], or Figure 2B of [29]) rather than beingan exponential. However, any distribution in which highly stable proteins arerare and marginally stable proteins are common should lead to qualitativelysimilar inferred values, since the subsequent analysis only employs thecumulative distribution function of in a rather coarse manner. Given the definition of in Equation 3, the exact numerical for simply sets a scale for the values (in conjunction with the bin size , it determines their units). As is described later in thispaper, in our actual computational implementation, we chose a value for that placed the magnitude of the inferred values in the same dynamic range as the informative priors. |
View Article: PubMed Central - PubMed
Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com