Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.
Bottom Line:
We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.
View Article:
PubMed Central - PubMed
Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com
ABSTRACT
Show MeSH
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution. Related in: MedlinePlus |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC2664478&req=5
Mentions: One of the strengths of our approach is that it allows for the use of informativepriors over the values. These priors can serve two purposes. One purpose issimply to prevent overfitting by regularizing [80] the values by biasing them towards a central reasonable range. Asecond purpose is to actively incorporate some of the substantial existingknowledge about how protein structure and amino-acid character influence values. One piece of this knowledge is simply the general factthat most mutations to proteins are destabilizing, and so have . It is also known that mutations that cause large changes inthe hydrophobicity of amino acids are often more destabilizing. At a moredetailed level, there are a number of physicochemical modeling programs thatattempt to make quantitative predictions of values from protein structural information [1]–[8]. We testedphylogenetic inference with priors incorporating information at all three ofthese levels, as shown in Figure4. At the most basic level, we used “regularizingpriors” that simply biased all the values towards the generally observed range of mildly tomoderately destabilizing. A second set of “hydrophobic”priors were based on the idea that mutations that cause large changes in aminoacid hydrophobicity will tend to be more destabilizing. For these priors, theprior estimate for each value was equal to the absolute value of the difference in thehydrophobicities of the wildtype and mutant amino acids, as given by the widelyused Kyte-Doolittle hydrophobicity scale [81]. These hydrophobicpriors therefore predicted that mutations that caused large changes inhydrophobicity would be highly destabilizing (), while those that led to small changes in hydrophobicitywould have little effect on stability (). A third set of “informative priors” weredesigned to leverage the full available knowledge about the effects of mutationson stability. This knowledge is most completely encapsulated in variousphysicochemically-based prediction programs [1]–[8],which utilize a wide range of structural and biophysical information to makequantitative predictions for individual mutations. We chose one of theseprograms, CUPSAT [8], to predict values for all single amino-acid mutations from the proteincrystal structures. We chose the CUPSAT program because it has a publiclyavailable webserver (http://cupsat.tu-bs.de) andhas reported benchmarks that equal or exceed those of other prediction programs[8]. The prior estimate for each mutation was then the value predicted by CUPSAT, after rescaling the predictions asdescribed below. For all three sets of priors, the prior for mutating residue from A to was a beta distribution probability density function peaked atthe prior estimate for that mutation. The beta distribution functions weredefined so that the sum of the alpha and beta parameters equaled three, and withthe functions going to zero at the upper and lower limits of the allowed rangefor the values. These prior functions are therefore broad, and looselybias the values toward the prior estimates. Examples of the priors areshown in Figure 4. Theoverall prior probability for the set of values was defined to the be product of the priorprobabilities for the individual values, . |
View Article: PubMed Central - PubMed
Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com