Limits...
On simplified global nonlinear function for fitness landscape: a case study of inverse protein folding.

Xu Y, Hu C, Dai Y, Liang J - PLoS ONE (2014)

Bottom Line: For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure.Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set.Our results can be generalized to construct other types of fitness landscape.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States of America.

ABSTRACT
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.

Show MeSH
Decoy generation by gapless threading.Sequence decoys can be generated by threading the sequence of a larger protein to the structure of an unrelated smaller protein.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4128808&req=5

pone-0104403-g001: Decoy generation by gapless threading.Sequence decoys can be generated by threading the sequence of a larger protein to the structure of an unrelated smaller protein.

Mentions: We followed Maiorov and Crippen [51] and used gapless threading to generate a large number of decoys for a simplified test of protein design. We threaded the sequence of a larger protein through the structure of a smaller protein, and obtained sequence decoys by mounting a fragment of the native sequence from the large protein to the full structure of the small protein. We therefore had a set of sequence decoys for each native protein (Fig 1). Because all native contacts were retained, such sequence decoys are quite challenging. This is unlike folding decoys generated by gapless threading [32].


On simplified global nonlinear function for fitness landscape: a case study of inverse protein folding.

Xu Y, Hu C, Dai Y, Liang J - PLoS ONE (2014)

Decoy generation by gapless threading.Sequence decoys can be generated by threading the sequence of a larger protein to the structure of an unrelated smaller protein.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4128808&req=5

pone-0104403-g001: Decoy generation by gapless threading.Sequence decoys can be generated by threading the sequence of a larger protein to the structure of an unrelated smaller protein.
Mentions: We followed Maiorov and Crippen [51] and used gapless threading to generate a large number of decoys for a simplified test of protein design. We threaded the sequence of a larger protein through the structure of a smaller protein, and obtained sequence decoys by mounting a fragment of the native sequence from the large protein to the full structure of the small protein. We therefore had a set of sequence decoys for each native protein (Fig 1). Because all native contacts were retained, such sequence decoys are quite challenging. This is unlike folding decoys generated by gapless threading [32].

Bottom Line: For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure.Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set.Our results can be generalized to construct other types of fitness landscape.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States of America.

ABSTRACT
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.

Show MeSH