Limits...
Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.

Bloom JD, Glassman MJ - PLoS Comput. Biol. (2009)

Bottom Line: We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.

View Article: PubMed Central - PubMed

Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com

ABSTRACT
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.

Show MeSH

Related in: MedlinePlus

Stability distributions and fixation probabilities.The panel at left show the probability  that a protein in an evolving population will have                            extra stability , as given by Equation 3. The panel at right shows the                            probability  that a mutation that causes a stability change of  will be neutral, as given by Equation 4. The units for  are arbitrary; for concreteness here we give them                            units of kcal/mol.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2664478&req=5

pcbi-1000349-g002: Stability distributions and fixation probabilities.The panel at left show the probability that a protein in an evolving population will have extra stability , as given by Equation 3. The panel at right shows the probability that a mutation that causes a stability change of will be neutral, as given by Equation 4. The units for are arbitrary; for concreteness here we give them units of kcal/mol.

Mentions: With this mean-field approximation, the issue becomes determining the average distribution of stabilities in an evolving population of proteins. This problem has been treated previously by simulations [29] and mathematically through matrix [31] and diffusion [32] equation approaches. The average distribution of stabilities turns out to depend on the degree of polymorphism in the population, with highly polymorphic populations (those with the product of the population size and the per sequence per generation mutation rate much greater than one) evolving to greater average stabilities than populations that are mostly monomorphic (those with ) [31],[67],[68]. Here we will consider only the case where the population is mostly monomorphic, so that all proteins tend to have converged to the same stability before a new mutation occurs (as is the case for the proteins shown in Figure 1). This choice is dictated by the fact that we are unclear how to incorporate the secondary selection for mutational robustness that occurs in highly polymorphic populations [31],[67],[68]. We acknowledge that some of the proteins that we analyze later in this paper (particularly influenza hemagglutinin) may actually evolve in populations that are highly polymorphic, and suggest that a mathematical treatment recognizing this fact is an area for future research. Given our choice to consider only the case where the population is mostly monomorphic, we will adopt the mathematical formalism described in [31] for the limit when (the more compact diffusion-equation approach of Shakhnovich and coworkers [32] cannot be used since it only applies when ). Following [31], we discretize the continuous variable of extra protein stability into small bins of width , and assign a protein to bin if it has extra stability such that , where . Here is some large integer giving an upper limit on the number of stability bins (so that all proteins in the evolving population have ). Note that all folded proteins fall into one of these bins, since proteins with fail to fold under the stability threshold model. Reference [31] finds that the distribution of average protein stabilities is well approximated by an exponential (see the middle panels of Figure 2 of this reference, or alternatively Figure 2A of [29]), such that the probability that a protein in the evolving population has extra stability that falls in bin is(3)where is a constant describing the steepness of the exponential. Figure 2 shows this distribution of protein stabilities graphically. Note that this exact mathematical form for is not proven in [31], but simply that all numerical solutions give distributions for that resemble this form. Other mathematical forms could be chosen for without altering the mathematical analysis that follows, although they might affect the actual numerical values that are ultimately inferred for the values. In particular, in highly polymorphic populations, the distribution of stabilities is peaked at a value slightly below the stability threshold (see right panels of Figure 2 of [31], Figure 2 of [32], or Figure 2B of [29]) rather than being an exponential. However, any distribution in which highly stable proteins are rare and marginally stable proteins are common should lead to qualitatively similar inferred values, since the subsequent analysis only employs the cumulative distribution function of in a rather coarse manner. Given the definition of in Equation 3, the exact numerical for simply sets a scale for the values (in conjunction with the bin size , it determines their units). As is described later in this paper, in our actual computational implementation, we chose a value for that placed the magnitude of the inferred values in the same dynamic range as the informative priors.


Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.

Bloom JD, Glassman MJ - PLoS Comput. Biol. (2009)

Stability distributions and fixation probabilities.The panel at left show the probability  that a protein in an evolving population will have                            extra stability , as given by Equation 3. The panel at right shows the                            probability  that a mutation that causes a stability change of  will be neutral, as given by Equation 4. The units for  are arbitrary; for concreteness here we give them                            units of kcal/mol.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2664478&req=5

pcbi-1000349-g002: Stability distributions and fixation probabilities.The panel at left show the probability that a protein in an evolving population will have extra stability , as given by Equation 3. The panel at right shows the probability that a mutation that causes a stability change of will be neutral, as given by Equation 4. The units for are arbitrary; for concreteness here we give them units of kcal/mol.
Mentions: With this mean-field approximation, the issue becomes determining the average distribution of stabilities in an evolving population of proteins. This problem has been treated previously by simulations [29] and mathematically through matrix [31] and diffusion [32] equation approaches. The average distribution of stabilities turns out to depend on the degree of polymorphism in the population, with highly polymorphic populations (those with the product of the population size and the per sequence per generation mutation rate much greater than one) evolving to greater average stabilities than populations that are mostly monomorphic (those with ) [31],[67],[68]. Here we will consider only the case where the population is mostly monomorphic, so that all proteins tend to have converged to the same stability before a new mutation occurs (as is the case for the proteins shown in Figure 1). This choice is dictated by the fact that we are unclear how to incorporate the secondary selection for mutational robustness that occurs in highly polymorphic populations [31],[67],[68]. We acknowledge that some of the proteins that we analyze later in this paper (particularly influenza hemagglutinin) may actually evolve in populations that are highly polymorphic, and suggest that a mathematical treatment recognizing this fact is an area for future research. Given our choice to consider only the case where the population is mostly monomorphic, we will adopt the mathematical formalism described in [31] for the limit when (the more compact diffusion-equation approach of Shakhnovich and coworkers [32] cannot be used since it only applies when ). Following [31], we discretize the continuous variable of extra protein stability into small bins of width , and assign a protein to bin if it has extra stability such that , where . Here is some large integer giving an upper limit on the number of stability bins (so that all proteins in the evolving population have ). Note that all folded proteins fall into one of these bins, since proteins with fail to fold under the stability threshold model. Reference [31] finds that the distribution of average protein stabilities is well approximated by an exponential (see the middle panels of Figure 2 of this reference, or alternatively Figure 2A of [29]), such that the probability that a protein in the evolving population has extra stability that falls in bin is(3)where is a constant describing the steepness of the exponential. Figure 2 shows this distribution of protein stabilities graphically. Note that this exact mathematical form for is not proven in [31], but simply that all numerical solutions give distributions for that resemble this form. Other mathematical forms could be chosen for without altering the mathematical analysis that follows, although they might affect the actual numerical values that are ultimately inferred for the values. In particular, in highly polymorphic populations, the distribution of stabilities is peaked at a value slightly below the stability threshold (see right panels of Figure 2 of [31], Figure 2 of [32], or Figure 2B of [29]) rather than being an exponential. However, any distribution in which highly stable proteins are rare and marginally stable proteins are common should lead to qualitatively similar inferred values, since the subsequent analysis only employs the cumulative distribution function of in a rather coarse manner. Given the definition of in Equation 3, the exact numerical for simply sets a scale for the values (in conjunction with the bin size , it determines their units). As is described later in this paper, in our actual computational implementation, we chose a value for that placed the magnitude of the inferred values in the same dynamic range as the informative priors.

Bottom Line: We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.

View Article: PubMed Central - PubMed

Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com

ABSTRACT
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.

Show MeSH
Related in: MedlinePlus