Limits...
Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

Smith CA, Kortemme T - PLoS ONE (2011)

Bottom Line: Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space.The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA.Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Show MeSH

Related in: MedlinePlus

Prediction of tolerated sequences for GB1 fold stability.Frequently observed amino acids in phage display are enriched in the GB1 prediction. A. The structure (PDB code 1FCC) of Streptococcal GB1 (blue) is shown bound to the Fc domain of human IgG (green). The core and peripheral residues that were randomized in phage display are shown with sticks and transparent spheres. The side chain atoms (starting at C-beta) of these amino acids are at least 7 Å away from any atom of the Fc domain, making residues selected at these positions unlikely to interact directly with the Fc domain. B. Amino acids are ranked individually for each sequence position by computationally predicted frequency (using the Boltzmann factor kT  = 0.23, as described in the main text). Wild type residues, which were used in protein ensemble generation, are shown in red. The dashed line indicates a typical cutoff of picking the top 5 amino acid choices at each position. C. Sequence logos (LOLA, University of Toronto) are shown for predictions with two different Boltzmann factors. The relative degree of specificity (in terms of bits of information, y-axis) shows good correspondence between prediction and phage display. Increasing the Boltzmann factor lowers the overall specificity and brings the absolute frequencies closer to phage display.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3138746&req=5

pone-0020451-g002: Prediction of tolerated sequences for GB1 fold stability.Frequently observed amino acids in phage display are enriched in the GB1 prediction. A. The structure (PDB code 1FCC) of Streptococcal GB1 (blue) is shown bound to the Fc domain of human IgG (green). The core and peripheral residues that were randomized in phage display are shown with sticks and transparent spheres. The side chain atoms (starting at C-beta) of these amino acids are at least 7 Å away from any atom of the Fc domain, making residues selected at these positions unlikely to interact directly with the Fc domain. B. Amino acids are ranked individually for each sequence position by computationally predicted frequency (using the Boltzmann factor kT  = 0.23, as described in the main text). Wild type residues, which were used in protein ensemble generation, are shown in red. The dashed line indicates a typical cutoff of picking the top 5 amino acid choices at each position. C. Sequence logos (LOLA, University of Toronto) are shown for predictions with two different Boltzmann factors. The relative degree of specificity (in terms of bits of information, y-axis) shows good correspondence between prediction and phage display. Increasing the Boltzmann factor lowers the overall specificity and brings the absolute frequencies closer to phage display.

Mentions: Raw sequencing data (Andrea G. Cochran, personal communication) from round three of phage display of the Streptococcus GB1 domain using the human IgG Fc domain as bait [9] included 185 sequences. Sequences were excluded that contained ambiguous reads, early stop codons, and mutations at sites other than those explicitly varied, leaving 171 total sequences and 167 unique sequences. For the hGH/hGHR example, phage display frequencies were taken from Figure 2 of the authors' publication [14]. Erbin PDZ frequencies were used as previously described [35].


Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

Smith CA, Kortemme T - PLoS ONE (2011)

Prediction of tolerated sequences for GB1 fold stability.Frequently observed amino acids in phage display are enriched in the GB1 prediction. A. The structure (PDB code 1FCC) of Streptococcal GB1 (blue) is shown bound to the Fc domain of human IgG (green). The core and peripheral residues that were randomized in phage display are shown with sticks and transparent spheres. The side chain atoms (starting at C-beta) of these amino acids are at least 7 Å away from any atom of the Fc domain, making residues selected at these positions unlikely to interact directly with the Fc domain. B. Amino acids are ranked individually for each sequence position by computationally predicted frequency (using the Boltzmann factor kT  = 0.23, as described in the main text). Wild type residues, which were used in protein ensemble generation, are shown in red. The dashed line indicates a typical cutoff of picking the top 5 amino acid choices at each position. C. Sequence logos (LOLA, University of Toronto) are shown for predictions with two different Boltzmann factors. The relative degree of specificity (in terms of bits of information, y-axis) shows good correspondence between prediction and phage display. Increasing the Boltzmann factor lowers the overall specificity and brings the absolute frequencies closer to phage display.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3138746&req=5

pone-0020451-g002: Prediction of tolerated sequences for GB1 fold stability.Frequently observed amino acids in phage display are enriched in the GB1 prediction. A. The structure (PDB code 1FCC) of Streptococcal GB1 (blue) is shown bound to the Fc domain of human IgG (green). The core and peripheral residues that were randomized in phage display are shown with sticks and transparent spheres. The side chain atoms (starting at C-beta) of these amino acids are at least 7 Å away from any atom of the Fc domain, making residues selected at these positions unlikely to interact directly with the Fc domain. B. Amino acids are ranked individually for each sequence position by computationally predicted frequency (using the Boltzmann factor kT  = 0.23, as described in the main text). Wild type residues, which were used in protein ensemble generation, are shown in red. The dashed line indicates a typical cutoff of picking the top 5 amino acid choices at each position. C. Sequence logos (LOLA, University of Toronto) are shown for predictions with two different Boltzmann factors. The relative degree of specificity (in terms of bits of information, y-axis) shows good correspondence between prediction and phage display. Increasing the Boltzmann factor lowers the overall specificity and brings the absolute frequencies closer to phage display.
Mentions: Raw sequencing data (Andrea G. Cochran, personal communication) from round three of phage display of the Streptococcus GB1 domain using the human IgG Fc domain as bait [9] included 185 sequences. Sequences were excluded that contained ambiguous reads, early stop codons, and mutations at sites other than those explicitly varied, leaving 171 total sequences and 167 unique sequences. For the hGH/hGHR example, phage display frequencies were taken from Figure 2 of the authors' publication [14]. Erbin PDZ frequencies were used as previously described [35].

Bottom Line: Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space.The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA.Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Show MeSH
Related in: MedlinePlus