Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin.
Bottom Line:
We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny.This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach.Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin.
View Article:
PubMed Central - PubMed
Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com
ABSTRACT
Show MeSH
One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution. Related in: MedlinePlus |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC2664478&req=5
Mentions: One of the strengths of our approach is that it allows for the use of informative priors over the values. These priors can serve two purposes. One purpose is simply to prevent overfitting by regularizing [80] the values by biasing them towards a central reasonable range. A second purpose is to actively incorporate some of the substantial existing knowledge about how protein structure and amino-acid character influence values. One piece of this knowledge is simply the general fact that most mutations to proteins are destabilizing, and so have . It is also known that mutations that cause large changes in the hydrophobicity of amino acids are often more destabilizing. At a more detailed level, there are a number of physicochemical modeling programs that attempt to make quantitative predictions of values from protein structural information [1]–[8]. We tested phylogenetic inference with priors incorporating information at all three of these levels, as shown in Figure 4. At the most basic level, we used “regularizing priors” that simply biased all the values towards the generally observed range of mildly to moderately destabilizing. A second set of “hydrophobic” priors were based on the idea that mutations that cause large changes in amino acid hydrophobicity will tend to be more destabilizing. For these priors, the prior estimate for each value was equal to the absolute value of the difference in the hydrophobicities of the wildtype and mutant amino acids, as given by the widely used Kyte-Doolittle hydrophobicity scale [81]. These hydrophobic priors therefore predicted that mutations that caused large changes in hydrophobicity would be highly destabilizing (), while those that led to small changes in hydrophobicity would have little effect on stability (). A third set of “informative priors” were designed to leverage the full available knowledge about the effects of mutations on stability. This knowledge is most completely encapsulated in various physicochemically-based prediction programs [1]–[8], which utilize a wide range of structural and biophysical information to make quantitative predictions for individual mutations. We chose one of these programs, CUPSAT [8], to predict values for all single amino-acid mutations from the protein crystal structures. We chose the CUPSAT program because it has a publicly available webserver (http://cupsat.tu-bs.de) and has reported benchmarks that equal or exceed those of other prediction programs [8]. The prior estimate for each mutation was then the value predicted by CUPSAT, after rescaling the predictions as described below. For all three sets of priors, the prior for mutating residue from A to was a beta distribution probability density function peaked at the prior estimate for that mutation. The beta distribution functions were defined so that the sum of the alpha and beta parameters equaled three, and with the functions going to zero at the upper and lower limits of the allowed range for the values. These prior functions are therefore broad, and loosely bias the values toward the prior estimates. Examples of the priors are shown in Figure 4. The overall prior probability for the set of values was defined to the be product of the prior probabilities for the individual values, . |
View Article: PubMed Central - PubMed
Affiliation: Division of Biology, California Institute of Technology, Pasadena, California, USA. jesse.bloom@gmail.com