Limits...
Sequence-based prediction of protein crystallization, purification and production propensity.

Mizianty MJ, Kurgan L - Bioinformatics (2011)

Bottom Line: We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors.Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.

ABSTRACT

Motivation: X-ray crystallography-based protein structure determination, which accounts for majority of solved structures, is characterized by relatively low success rates. One solution is to build tools which support selection of targets that are more likely to crystallize. Several in silico methods that predict propensity of diffraction-quality crystallization from protein chains were developed. We show that the quality of their predictions drops when applied to more recent crystallization trails, which calls for new solutions. We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.

Results: The proposed PPCpred (predictor of protein Production, Purification and Crystallization) predict propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material. PPCpred utilizes comprehensive set of inputs based on energy and hydrophobicity indices, composition of certain amino acid types, predicted disorder, secondary structure and solvent accessibility, and content of certain buried and exposed residues. Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors. Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains. Our model reveals several intuitive factors that influence the success of individual steps and the entire crystallization process, including the content of Cys, buried His and Ser, hydrophobic/hydrophilic segments and the number of predicted disordered segments.

Availability: http://biomine.ece.ualberta.ca/PPCpred/.

Contact: lkurgan@ece.ualberta.ca.

Show MeSH

Related in: MedlinePlus

Scatter plots of three pairs of features used by the PPCpred: features used for the prediction of crystallization (A); for the diffraction-quality crystallization (B) and for the purification (C). Size of the markers denotes the number of trials and color denotes their membership, green for the successful and black for the failed trials.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117383&req=5

Figure 3: Scatter plots of three pairs of features used by the PPCpred: features used for the prediction of crystallization (A); for the diffraction-quality crystallization (B) and for the purification (C). Size of the markers denotes the number of trials and color denotes their membership, green for the successful and black for the failed trials.

Mentions: Figure 3 shows scatter plots of three pairs of features that were selected for the prediction of the crystallization, diffraction-quality crystallization and purification, respectively. The two features used to predict crystallization, GOLD730101_min_10 and WERD780103_min_5 (Fig. 3A), and based on the minimal average values of the hydrophobicity (Goldsack and Chalifoux, 1973) and energy (specifically the energy of transfer in water of an isolated residue from a non-regular structure to the helical conformation) (Wertz and Scheraga, 1978) indices in the sliding windows of sizes 10 and 5, respectively. This means that the sequence segments with low hydrophobicity and transfer energy values are characteristic to chains that are difficult to crystallize. Importantly, combining these two features allows for improved separation between the successful and unsuccessful crystallization trials, i.e. trials for a given range of values of one index are further separated by the values of the other index. The diffraction-quality crystallization is impacted by the DIS_SEG and AA_bur_S features (Fig. 3B), which quantify the number of the predicted disorder segments and the content of the predicted buried Ser, respectively. The content of Ser was shown to be important for the prediction of crystallization propensity in Overton et al. (2008) and Kurgan et al. (2009), but these studies investigated the overall Ser content, while we show that the (predicted) buried Ser provides strong discriminatory power. Similarly, while the content of the predicted disordered residues was used in several related studies (Slabinski et al., 2007b; Price et al., 2009), our analysis reveals the strong influence of the number of disordered segments. The plot shows that chains with larger number of disordered segments and larger number of buried Ser are more difficult to crystallize. Finally, Figure 3C shows that chains with larger amount of buried Ser (AA_bur_S feature) and high hydrophobicity in a long-sliding window (GOLD730101_max_20 feature) are more challenging to purify.Fig. 3.


Sequence-based prediction of protein crystallization, purification and production propensity.

Mizianty MJ, Kurgan L - Bioinformatics (2011)

Scatter plots of three pairs of features used by the PPCpred: features used for the prediction of crystallization (A); for the diffraction-quality crystallization (B) and for the purification (C). Size of the markers denotes the number of trials and color denotes their membership, green for the successful and black for the failed trials.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117383&req=5

Figure 3: Scatter plots of three pairs of features used by the PPCpred: features used for the prediction of crystallization (A); for the diffraction-quality crystallization (B) and for the purification (C). Size of the markers denotes the number of trials and color denotes their membership, green for the successful and black for the failed trials.
Mentions: Figure 3 shows scatter plots of three pairs of features that were selected for the prediction of the crystallization, diffraction-quality crystallization and purification, respectively. The two features used to predict crystallization, GOLD730101_min_10 and WERD780103_min_5 (Fig. 3A), and based on the minimal average values of the hydrophobicity (Goldsack and Chalifoux, 1973) and energy (specifically the energy of transfer in water of an isolated residue from a non-regular structure to the helical conformation) (Wertz and Scheraga, 1978) indices in the sliding windows of sizes 10 and 5, respectively. This means that the sequence segments with low hydrophobicity and transfer energy values are characteristic to chains that are difficult to crystallize. Importantly, combining these two features allows for improved separation between the successful and unsuccessful crystallization trials, i.e. trials for a given range of values of one index are further separated by the values of the other index. The diffraction-quality crystallization is impacted by the DIS_SEG and AA_bur_S features (Fig. 3B), which quantify the number of the predicted disorder segments and the content of the predicted buried Ser, respectively. The content of Ser was shown to be important for the prediction of crystallization propensity in Overton et al. (2008) and Kurgan et al. (2009), but these studies investigated the overall Ser content, while we show that the (predicted) buried Ser provides strong discriminatory power. Similarly, while the content of the predicted disordered residues was used in several related studies (Slabinski et al., 2007b; Price et al., 2009), our analysis reveals the strong influence of the number of disordered segments. The plot shows that chains with larger number of disordered segments and larger number of buried Ser are more difficult to crystallize. Finally, Figure 3C shows that chains with larger amount of buried Ser (AA_bur_S feature) and high hydrophobicity in a long-sliding window (GOLD730101_max_20 feature) are more challenging to purify.Fig. 3.

Bottom Line: We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors.Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.

ABSTRACT

Motivation: X-ray crystallography-based protein structure determination, which accounts for majority of solved structures, is characterized by relatively low success rates. One solution is to build tools which support selection of targets that are more likely to crystallize. Several in silico methods that predict propensity of diffraction-quality crystallization from protein chains were developed. We show that the quality of their predictions drops when applied to more recent crystallization trails, which calls for new solutions. We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.

Results: The proposed PPCpred (predictor of protein Production, Purification and Crystallization) predict propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material. PPCpred utilizes comprehensive set of inputs based on energy and hydrophobicity indices, composition of certain amino acid types, predicted disorder, secondary structure and solvent accessibility, and content of certain buried and exposed residues. Our method significantly outperforms alignment-based predictions and several modern crystallization propensity predictors. Receiver operating characteristic (ROC) curves show that PPCpred is particularly useful for users who desire high true positive (TP) rates, i.e. low rate of mispredictions for solvable chains. Our model reveals several intuitive factors that influence the success of individual steps and the entire crystallization process, including the content of Cys, buried His and Ser, hydrophobic/hydrophilic segments and the number of predicted disordered segments.

Availability: http://biomine.ece.ualberta.ca/PPCpred/.

Contact: lkurgan@ece.ualberta.ca.

Show MeSH
Related in: MedlinePlus