Limits...
Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.

Bloom JD, Glassman MJ - PLoS Comput. Biol. (2009)

Bottom Line: We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.

View Article: PubMed Central - PubMed

Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com

ABSTRACT
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.

Show MeSH

Related in: MedlinePlus

Prior distributions, , over the  values.The “regularizing priors” are peaked at the                            moderately destabilizing value of  to capture the general knowledge that most mutations                            are destabilizing. The “hydrophobic priors” capture                            the knowledge that mutations that cause large changes in hydrophobicity                            are often more destabilizing. These priors are peaked at a value equal                            the the absolute value of the difference in amino acid hydrophobicity                            (as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutation                            from hydrophobic valine (V) to similarly hydrophobic leucine (L) is                            peaked near zero, while that for mutation from valine to charged lysine                            (K) is peaked at a much more destabilizing value. The                            “informative priors” are peaked at the  values predicted by the state-of-the-art                            physicochemically based program CUPSAT [8], and so                            are designed to leverage extensive pre-existing knowledge about  values. All the priors are fairly loose to make the  values responsive to their effect on the likelihood.                            The priors also help regularize [80] the  predictions by biasing them towards a reasonable                            range.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2664478&req=5

pcbi-1000349-g004: Prior distributions, , over the values.The “regularizing priors” are peaked at the moderately destabilizing value of to capture the general knowledge that most mutations are destabilizing. The “hydrophobic priors” capture the knowledge that mutations that cause large changes in hydrophobicity are often more destabilizing. These priors are peaked at a value equal the the absolute value of the difference in amino acid hydrophobicity (as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutation from hydrophobic valine (V) to similarly hydrophobic leucine (L) is peaked near zero, while that for mutation from valine to charged lysine (K) is peaked at a much more destabilizing value. The “informative priors” are peaked at the values predicted by the state-of-the-art physicochemically based program CUPSAT [8], and so are designed to leverage extensive pre-existing knowledge about values. All the priors are fairly loose to make the values responsive to their effect on the likelihood. The priors also help regularize [80] the predictions by biasing them towards a reasonable range.

Mentions: One of the strengths of our approach is that it allows for the use of informative priors over the values. These priors can serve two purposes. One purpose is simply to prevent overfitting by regularizing [80] the values by biasing them towards a central reasonable range. A second purpose is to actively incorporate some of the substantial existing knowledge about how protein structure and amino-acid character influence values. One piece of this knowledge is simply the general fact that most mutations to proteins are destabilizing, and so have . It is also known that mutations that cause large changes in the hydrophobicity of amino acids are often more destabilizing. At a more detailed level, there are a number of physicochemical modeling programs that attempt to make quantitative predictions of values from protein structural information [1]–[8]. We tested phylogenetic inference with priors incorporating information at all three of these levels, as shown in Figure 4. At the most basic level, we used “regularizing priors” that simply biased all the values towards the generally observed range of mildly to moderately destabilizing. A second set of “hydrophobic” priors were based on the idea that mutations that cause large changes in amino acid hydrophobicity will tend to be more destabilizing. For these priors, the prior estimate for each value was equal to the absolute value of the difference in the hydrophobicities of the wildtype and mutant amino acids, as given by the widely used Kyte-Doolittle hydrophobicity scale [81]. These hydrophobic priors therefore predicted that mutations that caused large changes in hydrophobicity would be highly destabilizing (), while those that led to small changes in hydrophobicity would have little effect on stability (). A third set of “informative priors” were designed to leverage the full available knowledge about the effects of mutations on stability. This knowledge is most completely encapsulated in various physicochemically-based prediction programs [1]–[8], which utilize a wide range of structural and biophysical information to make quantitative predictions for individual mutations. We chose one of these programs, CUPSAT [8], to predict values for all single amino-acid mutations from the protein crystal structures. We chose the CUPSAT program because it has a publicly available webserver (http://cupsat.tu-bs.de) and has reported benchmarks that equal or exceed those of other prediction programs [8]. The prior estimate for each mutation was then the value predicted by CUPSAT, after rescaling the predictions as described below. For all three sets of priors, the prior for mutating residue from A to was a beta distribution probability density function peaked at the prior estimate for that mutation. The beta distribution functions were defined so that the sum of the alpha and beta parameters equaled three, and with the functions going to zero at the upper and lower limits of the allowed range for the values. These prior functions are therefore broad, and loosely bias the values toward the prior estimates. Examples of the priors are shown in Figure 4. The overall prior probability for the set of values was defined to the be product of the prior probabilities for the individual values, .


Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.

Bloom JD, Glassman MJ - PLoS Comput. Biol. (2009)

Prior distributions, , over the  values.The “regularizing priors” are peaked at the                            moderately destabilizing value of  to capture the general knowledge that most mutations                            are destabilizing. The “hydrophobic priors” capture                            the knowledge that mutations that cause large changes in hydrophobicity                            are often more destabilizing. These priors are peaked at a value equal                            the the absolute value of the difference in amino acid hydrophobicity                            (as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutation                            from hydrophobic valine (V) to similarly hydrophobic leucine (L) is                            peaked near zero, while that for mutation from valine to charged lysine                            (K) is peaked at a much more destabilizing value. The                            “informative priors” are peaked at the  values predicted by the state-of-the-art                            physicochemically based program CUPSAT [8], and so                            are designed to leverage extensive pre-existing knowledge about  values. All the priors are fairly loose to make the  values responsive to their effect on the likelihood.                            The priors also help regularize [80] the  predictions by biasing them towards a reasonable                            range.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2664478&req=5

pcbi-1000349-g004: Prior distributions, , over the values.The “regularizing priors” are peaked at the moderately destabilizing value of to capture the general knowledge that most mutations are destabilizing. The “hydrophobic priors” capture the knowledge that mutations that cause large changes in hydrophobicity are often more destabilizing. These priors are peaked at a value equal the the absolute value of the difference in amino acid hydrophobicity (as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutation from hydrophobic valine (V) to similarly hydrophobic leucine (L) is peaked near zero, while that for mutation from valine to charged lysine (K) is peaked at a much more destabilizing value. The “informative priors” are peaked at the values predicted by the state-of-the-art physicochemically based program CUPSAT [8], and so are designed to leverage extensive pre-existing knowledge about values. All the priors are fairly loose to make the values responsive to their effect on the likelihood. The priors also help regularize [80] the predictions by biasing them towards a reasonable range.
Mentions: One of the strengths of our approach is that it allows for the use of informative priors over the values. These priors can serve two purposes. One purpose is simply to prevent overfitting by regularizing [80] the values by biasing them towards a central reasonable range. A second purpose is to actively incorporate some of the substantial existing knowledge about how protein structure and amino-acid character influence values. One piece of this knowledge is simply the general fact that most mutations to proteins are destabilizing, and so have . It is also known that mutations that cause large changes in the hydrophobicity of amino acids are often more destabilizing. At a more detailed level, there are a number of physicochemical modeling programs that attempt to make quantitative predictions of values from protein structural information [1]–[8]. We tested phylogenetic inference with priors incorporating information at all three of these levels, as shown in Figure 4. At the most basic level, we used “regularizing priors” that simply biased all the values towards the generally observed range of mildly to moderately destabilizing. A second set of “hydrophobic” priors were based on the idea that mutations that cause large changes in amino acid hydrophobicity will tend to be more destabilizing. For these priors, the prior estimate for each value was equal to the absolute value of the difference in the hydrophobicities of the wildtype and mutant amino acids, as given by the widely used Kyte-Doolittle hydrophobicity scale [81]. These hydrophobic priors therefore predicted that mutations that caused large changes in hydrophobicity would be highly destabilizing (), while those that led to small changes in hydrophobicity would have little effect on stability (). A third set of “informative priors” were designed to leverage the full available knowledge about the effects of mutations on stability. This knowledge is most completely encapsulated in various physicochemically-based prediction programs [1]–[8], which utilize a wide range of structural and biophysical information to make quantitative predictions for individual mutations. We chose one of these programs, CUPSAT [8], to predict values for all single amino-acid mutations from the protein crystal structures. We chose the CUPSAT program because it has a publicly available webserver (http://cupsat.tu-bs.de) and has reported benchmarks that equal or exceed those of other prediction programs [8]. The prior estimate for each mutation was then the value predicted by CUPSAT, after rescaling the predictions as described below. For all three sets of priors, the prior for mutating residue from A to was a beta distribution probability density function peaked at the prior estimate for that mutation. The beta distribution functions were defined so that the sum of the alpha and beta parameters equaled three, and with the functions going to zero at the upper and lower limits of the allowed range for the values. These prior functions are therefore broad, and loosely bias the values toward the prior estimates. Examples of the priors are shown in Figure 4. The overall prior probability for the set of values was defined to the be product of the prior probabilities for the individual values, .

Bottom Line: We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.

View Article: PubMed Central - PubMed

Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com

ABSTRACT
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.

Show MeSH
Related in: MedlinePlus