Limits...
Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.

Schmidt Am Busch M, Sedano A, Simonson T - PLoS ONE (2010)

Bottom Line: The results confirm and generalize our earlier study of SH2 and SH3 domains.For some families, designed sequences can be a useful complement to experimental ones for homologue searching.However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France.

ABSTRACT

Background: Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases.

Methodology/principal findings: WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed.

Conclusions/significance: For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

Show MeSH
Mean identity score vs. the folding free energy  (top) and its components (middle, bottom), for seven proteins.Results are for the 8,000 lowest-energy designed sequences, which are compared to their corresponding native template. The size of each symbol indicates the number of sequences with the corresponding energy (energies binned in 10 kcal/mol windows). Negative energies indicate stable folding of the designed sequences.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2864755&req=5

pone-0010410-g005: Mean identity score vs. the folding free energy (top) and its components (middle, bottom), for seven proteins.Results are for the 8,000 lowest-energy designed sequences, which are compared to their corresponding native template. The size of each symbol indicates the number of sequences with the corresponding energy (energies binned in 10 kcal/mol windows). Negative energies indicate stable folding of the designed sequences.

Mentions: The enhanced stability of the designed sequences prompted us to compare sequence “quality” to protein stability. Specifically, Fig. 5 shows the sequence identity of the 8,000 lowest-energy designed sequences (relative to the native template) as a function of the computed folding free energy, . In five out of six cases, the identity scores of the designed sequences improve as improves; i.e., the lowest-energy designed sequences have the best identity score. The SH3 graphs are clearly separated from the others, with a more negative slope. The best SH3 sequences are 100 kcal/mol below the highest value. For the two SH2 proteins, the curves are flatter, but there is still a slight increase in the identity scores as improves. For the 2FE5 PDZ domain, the identity scores of the designed sequences also increase as improves. Only for 1QAU, the identity score does not improve with , and actually gets worse for the most stable sequences. This provides support for using the folding free energy as a selection criterion, despite the overstabilization seen in Fig. 4.


Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.

Schmidt Am Busch M, Sedano A, Simonson T - PLoS ONE (2010)

Mean identity score vs. the folding free energy  (top) and its components (middle, bottom), for seven proteins.Results are for the 8,000 lowest-energy designed sequences, which are compared to their corresponding native template. The size of each symbol indicates the number of sequences with the corresponding energy (energies binned in 10 kcal/mol windows). Negative energies indicate stable folding of the designed sequences.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2864755&req=5

pone-0010410-g005: Mean identity score vs. the folding free energy (top) and its components (middle, bottom), for seven proteins.Results are for the 8,000 lowest-energy designed sequences, which are compared to their corresponding native template. The size of each symbol indicates the number of sequences with the corresponding energy (energies binned in 10 kcal/mol windows). Negative energies indicate stable folding of the designed sequences.
Mentions: The enhanced stability of the designed sequences prompted us to compare sequence “quality” to protein stability. Specifically, Fig. 5 shows the sequence identity of the 8,000 lowest-energy designed sequences (relative to the native template) as a function of the computed folding free energy, . In five out of six cases, the identity scores of the designed sequences improve as improves; i.e., the lowest-energy designed sequences have the best identity score. The SH3 graphs are clearly separated from the others, with a more negative slope. The best SH3 sequences are 100 kcal/mol below the highest value. For the two SH2 proteins, the curves are flatter, but there is still a slight increase in the identity scores as improves. For the 2FE5 PDZ domain, the identity scores of the designed sequences also increase as improves. Only for 1QAU, the identity score does not improve with , and actually gets worse for the most stable sequences. This provides support for using the folding free energy as a selection criterion, despite the overstabilization seen in Fig. 4.

Bottom Line: The results confirm and generalize our earlier study of SH2 and SH3 domains.For some families, designed sequences can be a useful complement to experimental ones for homologue searching.However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

View Article: PubMed Central - PubMed

Affiliation: Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France.

ABSTRACT

Background: Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases.

Methodology/principal findings: WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed.

Conclusions/significance: For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

Show MeSH