Limits...
Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

Smith CA, Kortemme T - PLoS ONE (2011)

Bottom Line: Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space.The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA.Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Show MeSH

Related in: MedlinePlus

Scheme for predicting the tolerated sequences for a protein fold or interaction.The input is at least one protein structure from the protein structure databank (2QMT in the example). Rosetta first creates an ensemble of backbone conformations using the backrub method [31], then predicts sequences consistent with each conformation in the ensemble, scoring each trial sequence–structure combination using the Rosetta score12, and finally combines the sequences into a predicted sequence profile. This approach ignores potential covariation between side chains. To speed up calculations, the scoring function is split into one-body terms describing the intrinsic energy of a particular residue conformation, and two-body terms between residues; these residue-residue interaction terms are assumed to be pairwise additive. One- and two-body terms are pre-calculated and stored in an interaction graph [42] such that optimization of sequence–structure combinations for entire proteins only takes seconds using look-up tables of interaction energies. For the interaction graph, vectors of residue self-energies (one body) are stored on the vertices (green circles) and matrices of residue interaction energies (two body) are stored on the edges (thick black lines). Computed interaction energies within proteins, between proteins, or between groups of residues can be reweighted to generate custom fitness functions for specific applications. This flexibility in scoring residue groups allows modeling of separate requirements, such as those to maintain residues required in an interaction interface with a binding partner. Group and group interaction reweighting is typically only done for protein-protein interactions. (For the monomeric GB1 domain shown here, no reweighting was applied.)
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3138746&req=5

pone-0020451-g001: Scheme for predicting the tolerated sequences for a protein fold or interaction.The input is at least one protein structure from the protein structure databank (2QMT in the example). Rosetta first creates an ensemble of backbone conformations using the backrub method [31], then predicts sequences consistent with each conformation in the ensemble, scoring each trial sequence–structure combination using the Rosetta score12, and finally combines the sequences into a predicted sequence profile. This approach ignores potential covariation between side chains. To speed up calculations, the scoring function is split into one-body terms describing the intrinsic energy of a particular residue conformation, and two-body terms between residues; these residue-residue interaction terms are assumed to be pairwise additive. One- and two-body terms are pre-calculated and stored in an interaction graph [42] such that optimization of sequence–structure combinations for entire proteins only takes seconds using look-up tables of interaction energies. For the interaction graph, vectors of residue self-energies (one body) are stored on the vertices (green circles) and matrices of residue interaction energies (two body) are stored on the edges (thick black lines). Computed interaction energies within proteins, between proteins, or between groups of residues can be reweighted to generate custom fitness functions for specific applications. This flexibility in scoring residue groups allows modeling of separate requirements, such as those to maintain residues required in an interaction interface with a binding partner. Group and group interaction reweighting is typically only done for protein-protein interactions. (For the monomeric GB1 domain shown here, no reweighting was applied.)

Mentions: The protocol and methods described here (Figure 1) aim to identify the amino acid types that can be tolerated at a given set of positions while still preserving protein fold stability and function (most commonly represented as binding). There are two general stages of the protocol: (1) creation of a set of protein backbone conformations (ensemble generation), and (2) prediction of sequences consistent with the ensemble conformations. The input to the protocol is at least one protein structure in PDB format and a definition of residue positions. There are three sets of sequence positions that can be defined: The first set of amino acids includes those that are mutated prior to ensemble generation in stage (1) and often remain the same for all subsequent simulations. These positions will be referred to as the “premutated” positions. Definition of premutated positions is optional. If no positions are chosen, the input sequence will be used for ensemble generation. The second, most important set of positions are those that can vary their amino acid type in stage (2); these have to be defined by the user and will be referred to as the “designed” positions. For each designed positions, a set of considered amino acid types can be defined, as described in the “Detailed Workflow” section below. A final set of amino acids includes those whose conformations (but not amino acid types) change during sequence scoring in step (2). This set will be referred to as the “repacked” positions and is often a superset of the “premutated” positions. These positions can be determined by the user or automatically chosen by the protocol. The predicted tolerated amino acid types at the designed positions will depend on how many other positions are allowed to vary simultaneously (for example, allowing residues in a surrounding shell to be repacked may help to accommodate different amino acid choices at designed positions). For all of the results reported here, as well as a in previous study [35], residues chosen for repack included all those with a C-alpha atom with 10 Å of the C-alpha atom of a designed position. This is the current default if repacked positions are chosen automatically by the protocol. Smaller sets of repacked positions can be used to restrict sequence diversity and simulate more conservative changes closer to the starting sequence and conformation, or to reduce the computational time required for the algorithm.


Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

Smith CA, Kortemme T - PLoS ONE (2011)

Scheme for predicting the tolerated sequences for a protein fold or interaction.The input is at least one protein structure from the protein structure databank (2QMT in the example). Rosetta first creates an ensemble of backbone conformations using the backrub method [31], then predicts sequences consistent with each conformation in the ensemble, scoring each trial sequence–structure combination using the Rosetta score12, and finally combines the sequences into a predicted sequence profile. This approach ignores potential covariation between side chains. To speed up calculations, the scoring function is split into one-body terms describing the intrinsic energy of a particular residue conformation, and two-body terms between residues; these residue-residue interaction terms are assumed to be pairwise additive. One- and two-body terms are pre-calculated and stored in an interaction graph [42] such that optimization of sequence–structure combinations for entire proteins only takes seconds using look-up tables of interaction energies. For the interaction graph, vectors of residue self-energies (one body) are stored on the vertices (green circles) and matrices of residue interaction energies (two body) are stored on the edges (thick black lines). Computed interaction energies within proteins, between proteins, or between groups of residues can be reweighted to generate custom fitness functions for specific applications. This flexibility in scoring residue groups allows modeling of separate requirements, such as those to maintain residues required in an interaction interface with a binding partner. Group and group interaction reweighting is typically only done for protein-protein interactions. (For the monomeric GB1 domain shown here, no reweighting was applied.)
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3138746&req=5

pone-0020451-g001: Scheme for predicting the tolerated sequences for a protein fold or interaction.The input is at least one protein structure from the protein structure databank (2QMT in the example). Rosetta first creates an ensemble of backbone conformations using the backrub method [31], then predicts sequences consistent with each conformation in the ensemble, scoring each trial sequence–structure combination using the Rosetta score12, and finally combines the sequences into a predicted sequence profile. This approach ignores potential covariation between side chains. To speed up calculations, the scoring function is split into one-body terms describing the intrinsic energy of a particular residue conformation, and two-body terms between residues; these residue-residue interaction terms are assumed to be pairwise additive. One- and two-body terms are pre-calculated and stored in an interaction graph [42] such that optimization of sequence–structure combinations for entire proteins only takes seconds using look-up tables of interaction energies. For the interaction graph, vectors of residue self-energies (one body) are stored on the vertices (green circles) and matrices of residue interaction energies (two body) are stored on the edges (thick black lines). Computed interaction energies within proteins, between proteins, or between groups of residues can be reweighted to generate custom fitness functions for specific applications. This flexibility in scoring residue groups allows modeling of separate requirements, such as those to maintain residues required in an interaction interface with a binding partner. Group and group interaction reweighting is typically only done for protein-protein interactions. (For the monomeric GB1 domain shown here, no reweighting was applied.)
Mentions: The protocol and methods described here (Figure 1) aim to identify the amino acid types that can be tolerated at a given set of positions while still preserving protein fold stability and function (most commonly represented as binding). There are two general stages of the protocol: (1) creation of a set of protein backbone conformations (ensemble generation), and (2) prediction of sequences consistent with the ensemble conformations. The input to the protocol is at least one protein structure in PDB format and a definition of residue positions. There are three sets of sequence positions that can be defined: The first set of amino acids includes those that are mutated prior to ensemble generation in stage (1) and often remain the same for all subsequent simulations. These positions will be referred to as the “premutated” positions. Definition of premutated positions is optional. If no positions are chosen, the input sequence will be used for ensemble generation. The second, most important set of positions are those that can vary their amino acid type in stage (2); these have to be defined by the user and will be referred to as the “designed” positions. For each designed positions, a set of considered amino acid types can be defined, as described in the “Detailed Workflow” section below. A final set of amino acids includes those whose conformations (but not amino acid types) change during sequence scoring in step (2). This set will be referred to as the “repacked” positions and is often a superset of the “premutated” positions. These positions can be determined by the user or automatically chosen by the protocol. The predicted tolerated amino acid types at the designed positions will depend on how many other positions are allowed to vary simultaneously (for example, allowing residues in a surrounding shell to be repacked may help to accommodate different amino acid choices at designed positions). For all of the results reported here, as well as a in previous study [35], residues chosen for repack included all those with a C-alpha atom with 10 Å of the C-alpha atom of a designed position. This is the current default if repacked positions are chosen automatically by the protocol. Smaller sets of repacked positions can be used to restrict sequence diversity and simulate more conservative changes closer to the starting sequence and conformation, or to reduce the computational time required for the algorithm.

Bottom Line: Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space.The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA.Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

Show MeSH
Related in: MedlinePlus