Limits...
Sequence-based prediction of protein crystallization, purification and production propensity.

Mizianty MJ, Kurgan L - Bioinformatics (2011)

Bottom Line: We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors.Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.

ABSTRACT

Motivation: X-ray crystallography-based protein structure determination, which accounts for majority of solved structures, is characterized by relatively low success rates. One solution is to build tools which support selection of targets that are more likely to crystallize. Several in silico methods that predict propensity of diffraction-quality crystallization from protein chains were developed. We show that the quality of their predictions drops when applied to more recent crystallization trails, which calls for new solutions. We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.

Results: The proposed PPCpred (predictor of protein Production, Purification and Crystallization) predict propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material. PPCpred utilizes comprehensive set of inputs based on energy and hydrophobicity indices, composition of certain amino acid types, predicted disorder, secondary structure and solvent accessibility, and content of certain buried and exposed residues. Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors. Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains. Our model reveals several intuitive factors that influence the success of individual steps and the entire crystallization process, including the content of Cys, buried His and Ser, hydrophobic/hydrophilic segments and the number of predicted disordered segments.

Availability: http://biomine.ece.ualberta.ca/PPCpred/.

Contact: lkurgan@ece.ualberta.ca.

Show MeSH

Related in: MedlinePlus

The overall architecture of the proposed PPCpred method.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117383&req=5

Figure 1: The overall architecture of the proposed PPCpred method.

Mentions: The prediction is performed in two steps: (i) the input sequences are converted into a set of numerical features that describe certain, relevant characteristics of the protein chain; and (ii) the feature values are fed into four predictive models that output the predicted propensity for material production, purification, crystallization and diffraction quality crystallization, respectively; we use a support vector machine (SVM), which was previously shown to provide high-quality predictions in this area (Kandaswamy et al., 2010; Smialowski et al. 2006), to implement these four models and their outputs are aggregated together to provide a four-class prediction. The architecture of the PPCpred method is shown in Figure 1.Fig. 1.


Sequence-based prediction of protein crystallization, purification and production propensity.

Mizianty MJ, Kurgan L - Bioinformatics (2011)

The overall architecture of the proposed PPCpred method.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117383&req=5

Figure 1: The overall architecture of the proposed PPCpred method.
Mentions: The prediction is performed in two steps: (i) the input sequences are converted into a set of numerical features that describe certain, relevant characteristics of the protein chain; and (ii) the feature values are fed into four predictive models that output the predicted propensity for material production, purification, crystallization and diffraction quality crystallization, respectively; we use a support vector machine (SVM), which was previously shown to provide high-quality predictions in this area (Kandaswamy et al., 2010; Smialowski et al. 2006), to implement these four models and their outputs are aggregated together to provide a four-class prediction. The architecture of the PPCpred method is shown in Figure 1.Fig. 1.

Bottom Line: We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors.Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.

ABSTRACT

Motivation: X-ray crystallography-based protein structure determination, which accounts for majority of solved structures, is characterized by relatively low success rates. One solution is to build tools which support selection of targets that are more likely to crystallize. Several in silico methods that predict propensity of diffraction-quality crystallization from protein chains were developed. We show that the quality of their predictions drops when applied to more recent crystallization trails, which calls for new solutions. We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.

Results: The proposed PPCpred (predictor of protein Production, Purification and Crystallization) predict propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material. PPCpred utilizes comprehensive set of inputs based on energy and hydrophobicity indices, composition of certain amino acid types, predicted disorder, secondary structure and solvent accessibility, and content of certain buried and exposed residues. Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors. Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains. Our model reveals several intuitive factors that influence the success of individual steps and the entire crystallization process, including the content of Cys, buried His and Ser, hydrophobic/hydrophilic segments and the number of predicted disordered segments.

Availability: http://biomine.ece.ualberta.ca/PPCpred/.

Contact: lkurgan@ece.ualberta.ca.

Show MeSH
Related in: MedlinePlus