Limits...
Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.

Bloom JD, Glassman MJ - PLoS Comput. Biol. (2009)

Bottom Line: We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.

View Article: PubMed Central - PubMed

Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com

ABSTRACT
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.

Show MeSH

Related in: MedlinePlus

Prior distributions, , over the  values.The “regularizing priors” are peaked at themoderately destabilizing value of  to capture the general knowledge that most mutationsare destabilizing. The “hydrophobic priors” capturethe knowledge that mutations that cause large changes in hydrophobicityare often more destabilizing. These priors are peaked at a value equalthe the absolute value of the difference in amino acid hydrophobicity(as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutationfrom hydrophobic valine (V) to similarly hydrophobic leucine (L) ispeaked near zero, while that for mutation from valine to charged lysine(K) is peaked at a much more destabilizing value. The“informative priors” are peaked at the  values predicted by the state-of-the-artphysicochemically based program CUPSAT [8], and soare designed to leverage extensive pre-existing knowledge about  values. All the priors are fairly loose to make the  values responsive to their effect on the likelihood.The priors also help regularize [80] the  predictions by biasing them towards a reasonablerange.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2664478&req=5

pcbi-1000349-g004: Prior distributions, , over the values.The “regularizing priors” are peaked at themoderately destabilizing value of to capture the general knowledge that most mutationsare destabilizing. The “hydrophobic priors” capturethe knowledge that mutations that cause large changes in hydrophobicityare often more destabilizing. These priors are peaked at a value equalthe the absolute value of the difference in amino acid hydrophobicity(as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutationfrom hydrophobic valine (V) to similarly hydrophobic leucine (L) ispeaked near zero, while that for mutation from valine to charged lysine(K) is peaked at a much more destabilizing value. The“informative priors” are peaked at the values predicted by the state-of-the-artphysicochemically based program CUPSAT [8], and soare designed to leverage extensive pre-existing knowledge about values. All the priors are fairly loose to make the values responsive to their effect on the likelihood.The priors also help regularize [80] the predictions by biasing them towards a reasonablerange.

Mentions: One of the strengths of our approach is that it allows for the use of informativepriors over the values. These priors can serve two purposes. One purpose issimply to prevent overfitting by regularizing [80] the values by biasing them towards a central reasonable range. Asecond purpose is to actively incorporate some of the substantial existingknowledge about how protein structure and amino-acid character influence values. One piece of this knowledge is simply the general factthat most mutations to proteins are destabilizing, and so have . It is also known that mutations that cause large changes inthe hydrophobicity of amino acids are often more destabilizing. At a moredetailed level, there are a number of physicochemical modeling programs thatattempt to make quantitative predictions of values from protein structural information [1]–[8]. We testedphylogenetic inference with priors incorporating information at all three ofthese levels, as shown in Figure4. At the most basic level, we used “regularizingpriors” that simply biased all the values towards the generally observed range of mildly tomoderately destabilizing. A second set of “hydrophobic”priors were based on the idea that mutations that cause large changes in aminoacid hydrophobicity will tend to be more destabilizing. For these priors, theprior estimate for each value was equal to the absolute value of the difference in thehydrophobicities of the wildtype and mutant amino acids, as given by the widelyused Kyte-Doolittle hydrophobicity scale [81]. These hydrophobicpriors therefore predicted that mutations that caused large changes inhydrophobicity would be highly destabilizing (), while those that led to small changes in hydrophobicitywould have little effect on stability (). A third set of “informative priors” weredesigned to leverage the full available knowledge about the effects of mutationson stability. This knowledge is most completely encapsulated in variousphysicochemically-based prediction programs [1]–[8],which utilize a wide range of structural and biophysical information to makequantitative predictions for individual mutations. We chose one of theseprograms, CUPSAT [8], to predict values for all single amino-acid mutations from the proteincrystal structures. We chose the CUPSAT program because it has a publiclyavailable webserver (http://cupsat.tu-bs.de) andhas reported benchmarks that equal or exceed those of other prediction programs[8]. The prior estimate for each mutation was then the value predicted by CUPSAT, after rescaling the predictions asdescribed below. For all three sets of priors, the prior for mutating residue from A to was a beta distribution probability density function peaked atthe prior estimate for that mutation. The beta distribution functions weredefined so that the sum of the alpha and beta parameters equaled three, and withthe functions going to zero at the upper and lower limits of the allowed rangefor the values. These prior functions are therefore broad, and looselybias the values toward the prior estimates. Examples of the priors areshown in Figure 4. Theoverall prior probability for the set of values was defined to the be product of the priorprobabilities for the individual values, .


Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.

Bloom JD, Glassman MJ - PLoS Comput. Biol. (2009)

Prior distributions, , over the  values.The “regularizing priors” are peaked at themoderately destabilizing value of  to capture the general knowledge that most mutationsare destabilizing. The “hydrophobic priors” capturethe knowledge that mutations that cause large changes in hydrophobicityare often more destabilizing. These priors are peaked at a value equalthe the absolute value of the difference in amino acid hydrophobicity(as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutationfrom hydrophobic valine (V) to similarly hydrophobic leucine (L) ispeaked near zero, while that for mutation from valine to charged lysine(K) is peaked at a much more destabilizing value. The“informative priors” are peaked at the  values predicted by the state-of-the-artphysicochemically based program CUPSAT [8], and soare designed to leverage extensive pre-existing knowledge about  values. All the priors are fairly loose to make the  values responsive to their effect on the likelihood.The priors also help regularize [80] the  predictions by biasing them towards a reasonablerange.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2664478&req=5

pcbi-1000349-g004: Prior distributions, , over the values.The “regularizing priors” are peaked at themoderately destabilizing value of to capture the general knowledge that most mutationsare destabilizing. The “hydrophobic priors” capturethe knowledge that mutations that cause large changes in hydrophobicityare often more destabilizing. These priors are peaked at a value equalthe the absolute value of the difference in amino acid hydrophobicity(as defined by the widely used Kyte-Doolittle scale [81]). For example, the prior for a mutationfrom hydrophobic valine (V) to similarly hydrophobic leucine (L) ispeaked near zero, while that for mutation from valine to charged lysine(K) is peaked at a much more destabilizing value. The“informative priors” are peaked at the values predicted by the state-of-the-artphysicochemically based program CUPSAT [8], and soare designed to leverage extensive pre-existing knowledge about values. All the priors are fairly loose to make the values responsive to their effect on the likelihood.The priors also help regularize [80] the predictions by biasing them towards a reasonablerange.
Mentions: One of the strengths of our approach is that it allows for the use of informativepriors over the values. These priors can serve two purposes. One purpose issimply to prevent overfitting by regularizing [80] the values by biasing them towards a central reasonable range. Asecond purpose is to actively incorporate some of the substantial existingknowledge about how protein structure and amino-acid character influence values. One piece of this knowledge is simply the general factthat most mutations to proteins are destabilizing, and so have . It is also known that mutations that cause large changes inthe hydrophobicity of amino acids are often more destabilizing. At a moredetailed level, there are a number of physicochemical modeling programs thatattempt to make quantitative predictions of values from protein structural information [1]–[8]. We testedphylogenetic inference with priors incorporating information at all three ofthese levels, as shown in Figure4. At the most basic level, we used “regularizingpriors” that simply biased all the values towards the generally observed range of mildly tomoderately destabilizing. A second set of “hydrophobic”priors were based on the idea that mutations that cause large changes in aminoacid hydrophobicity will tend to be more destabilizing. For these priors, theprior estimate for each value was equal to the absolute value of the difference in thehydrophobicities of the wildtype and mutant amino acids, as given by the widelyused Kyte-Doolittle hydrophobicity scale [81]. These hydrophobicpriors therefore predicted that mutations that caused large changes inhydrophobicity would be highly destabilizing (), while those that led to small changes in hydrophobicitywould have little effect on stability (). A third set of “informative priors” weredesigned to leverage the full available knowledge about the effects of mutationson stability. This knowledge is most completely encapsulated in variousphysicochemically-based prediction programs [1]–[8],which utilize a wide range of structural and biophysical information to makequantitative predictions for individual mutations. We chose one of theseprograms, CUPSAT [8], to predict values for all single amino-acid mutations from the proteincrystal structures. We chose the CUPSAT program because it has a publiclyavailable webserver (http://cupsat.tu-bs.de) andhas reported benchmarks that equal or exceed those of other prediction programs[8]. The prior estimate for each mutation was then the value predicted by CUPSAT, after rescaling the predictions asdescribed below. For all three sets of priors, the prior for mutating residue from A to was a beta distribution probability density function peaked atthe prior estimate for that mutation. The beta distribution functions weredefined so that the sum of the alpha and beta parameters equaled three, and withthe functions going to zero at the upper and lower limits of the allowed rangefor the values. These prior functions are therefore broad, and looselybias the values toward the prior estimates. Examples of the priors areshown in Figure 4. Theoverall prior probability for the set of values was defined to the be product of the priorprobabilities for the individual values, .

Bottom Line: We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.

View Article: PubMed Central - PubMed

Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com

ABSTRACT
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.

Show MeSH
Related in: MedlinePlus