Limits...
Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

Smith CA, Kortemme T - PLoS ONE (2011)

Bottom Line: Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space.The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA.Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Show MeSH

Related in: MedlinePlus

hGH/hGHR interface tolerance prediction.The generalized Rosetta 3 protocol described here was applied to rank human growth hormone (hGH) amino acids by computationally predicted frequency. The residue positions shown and their ordering are taken from previously published results using the Rosetta 2 protocol (Humphris & Kortemme, Table 2 [23]). Wild type residues, which were used in protein ensemble generation, are shown in red. For each position, an average of 59% of the amino acids observed in phage display (≥10% experimental frequency) are predicted within the top five computationally ranked amino acids (above dashed line). Overall performance was similar to previous results of the Rosetta 2 protocol. Amino acids (other than wild-type) included in the computationally selected library from the Rosetta 2 protocol are indicated with a star. If the same number of amino acids at each position is used as defined in the computational library in [23], Table 2, the Rosetta 3 protocol misses two frequently observed amino acids included by Rosetta 2 (V67 and L176). Conversely, the Rosetta 2 protocol misses three frequently observed amino acids included by Rosetta 3 (S21, A21, and E22). Both protocols share similar false positive predictions. However, the Rosetta 3 histidine reference energy reweighting (see Methods) eliminates 6 out of 8 histidine false positives (H*).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3138746&req=5

pone-0020451-g003: hGH/hGHR interface tolerance prediction.The generalized Rosetta 3 protocol described here was applied to rank human growth hormone (hGH) amino acids by computationally predicted frequency. The residue positions shown and their ordering are taken from previously published results using the Rosetta 2 protocol (Humphris & Kortemme, Table 2 [23]). Wild type residues, which were used in protein ensemble generation, are shown in red. For each position, an average of 59% of the amino acids observed in phage display (≥10% experimental frequency) are predicted within the top five computationally ranked amino acids (above dashed line). Overall performance was similar to previous results of the Rosetta 2 protocol. Amino acids (other than wild-type) included in the computationally selected library from the Rosetta 2 protocol are indicated with a star. If the same number of amino acids at each position is used as defined in the computational library in [23], Table 2, the Rosetta 3 protocol misses two frequently observed amino acids included by Rosetta 2 (V67 and L176). Conversely, the Rosetta 2 protocol misses three frequently observed amino acids included by Rosetta 3 (S21, A21, and E22). Both protocols share similar false positive predictions. However, the Rosetta 3 histidine reference energy reweighting (see Methods) eliminates 6 out of 8 histidine false positives (H*).

Mentions: 16 designed hGH amino acid positions as defined in [23] and shown in Figure 3.


Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

Smith CA, Kortemme T - PLoS ONE (2011)

hGH/hGHR interface tolerance prediction.The generalized Rosetta 3 protocol described here was applied to rank human growth hormone (hGH) amino acids by computationally predicted frequency. The residue positions shown and their ordering are taken from previously published results using the Rosetta 2 protocol (Humphris & Kortemme, Table 2 [23]). Wild type residues, which were used in protein ensemble generation, are shown in red. For each position, an average of 59% of the amino acids observed in phage display (≥10% experimental frequency) are predicted within the top five computationally ranked amino acids (above dashed line). Overall performance was similar to previous results of the Rosetta 2 protocol. Amino acids (other than wild-type) included in the computationally selected library from the Rosetta 2 protocol are indicated with a star. If the same number of amino acids at each position is used as defined in the computational library in [23], Table 2, the Rosetta 3 protocol misses two frequently observed amino acids included by Rosetta 2 (V67 and L176). Conversely, the Rosetta 2 protocol misses three frequently observed amino acids included by Rosetta 3 (S21, A21, and E22). Both protocols share similar false positive predictions. However, the Rosetta 3 histidine reference energy reweighting (see Methods) eliminates 6 out of 8 histidine false positives (H*).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3138746&req=5

pone-0020451-g003: hGH/hGHR interface tolerance prediction.The generalized Rosetta 3 protocol described here was applied to rank human growth hormone (hGH) amino acids by computationally predicted frequency. The residue positions shown and their ordering are taken from previously published results using the Rosetta 2 protocol (Humphris & Kortemme, Table 2 [23]). Wild type residues, which were used in protein ensemble generation, are shown in red. For each position, an average of 59% of the amino acids observed in phage display (≥10% experimental frequency) are predicted within the top five computationally ranked amino acids (above dashed line). Overall performance was similar to previous results of the Rosetta 2 protocol. Amino acids (other than wild-type) included in the computationally selected library from the Rosetta 2 protocol are indicated with a star. If the same number of amino acids at each position is used as defined in the computational library in [23], Table 2, the Rosetta 3 protocol misses two frequently observed amino acids included by Rosetta 2 (V67 and L176). Conversely, the Rosetta 2 protocol misses three frequently observed amino acids included by Rosetta 3 (S21, A21, and E22). Both protocols share similar false positive predictions. However, the Rosetta 3 histidine reference energy reweighting (see Methods) eliminates 6 out of 8 histidine false positives (H*).
Mentions: 16 designed hGH amino acid positions as defined in [23] and shown in Figure 3.

Bottom Line: Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space.The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA.Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Show MeSH
Related in: MedlinePlus