Limits...
Prodepth: predict residue depth by support vector regression approach from protein sequences only.

Song J, Tan H, Mahmood K, Law RH, Buckle AM, Webb GI, Akutsu T, Whisstock JC - PLoS ONE (2009)

Bottom Line: The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally.We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling.This method might prove to be a powerful tool for sequence analysis.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia. Jiangning.Song@med.monash.edu.au

ABSTRACT
Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.

Show MeSH

Related in: MedlinePlus

The predicted and observed residue depth profiles for the anti-fungal chitosanase (PDB code:1chk, chain A), as well as the structural mapping of the predicted RD profiles.In Figure 8A, the blue solid line represents the observed RD values, while the red dashed line represents the predicted RD values. In Figure 8B, the sequence regions predicted with different mean absolute errors are colored with a color scale going from red to blue, where red corresponds to the best predicted regions and blue to worst predicted regions. In addition, the active site residues (E22, D40 and T45) are highlighted by the orange sticks, while the functionally important residues involved in chitosan substrate binding (D57, E197 and E201) are represented by dark green sticks [52], [53]. The structural images are prepared using the program PyMOL [82]. For the sake of visualization, structural figures are shown in stereo.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2742725&req=5

pone-0007072-g008: The predicted and observed residue depth profiles for the anti-fungal chitosanase (PDB code:1chk, chain A), as well as the structural mapping of the predicted RD profiles.In Figure 8A, the blue solid line represents the observed RD values, while the red dashed line represents the predicted RD values. In Figure 8B, the sequence regions predicted with different mean absolute errors are colored with a color scale going from red to blue, where red corresponds to the best predicted regions and blue to worst predicted regions. In addition, the active site residues (E22, D40 and T45) are highlighted by the orange sticks, while the functionally important residues involved in chitosan substrate binding (D57, E197 and E201) are represented by dark green sticks [52], [53]. The structural images are prepared using the program PyMOL [82]. For the sake of visualization, structural figures are shown in stereo.

Mentions: We illustrated the performance of the Prodepth predictor by presenting two examples and showed their predicted RD profiles with the structural mapping of the MAE values on the three-dimensional structures in Figure 7 and 8. The first example is the Escherichia coli peptidyl-tRNA hydrolase (PDB code: 2pth, chain A) [51], which is well predicted with a CC of 0.89 and RMSE of 0.93. For the majority regions of this protein, there is a good agreement between the predicted and observed RD values despite that several separate residue positions such as 6, 61, 91, 132 and 134 are poorly predicted (blue), as can be seen from Figure 7A. Interestingly, many of these residues map to the hydrophobic core (Figure S3), however, it is unclear from sequence or structural perspective why these regions are poorly predicted.


Prodepth: predict residue depth by support vector regression approach from protein sequences only.

Song J, Tan H, Mahmood K, Law RH, Buckle AM, Webb GI, Akutsu T, Whisstock JC - PLoS ONE (2009)

The predicted and observed residue depth profiles for the anti-fungal chitosanase (PDB code:1chk, chain A), as well as the structural mapping of the predicted RD profiles.In Figure 8A, the blue solid line represents the observed RD values, while the red dashed line represents the predicted RD values. In Figure 8B, the sequence regions predicted with different mean absolute errors are colored with a color scale going from red to blue, where red corresponds to the best predicted regions and blue to worst predicted regions. In addition, the active site residues (E22, D40 and T45) are highlighted by the orange sticks, while the functionally important residues involved in chitosan substrate binding (D57, E197 and E201) are represented by dark green sticks [52], [53]. The structural images are prepared using the program PyMOL [82]. For the sake of visualization, structural figures are shown in stereo.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2742725&req=5

pone-0007072-g008: The predicted and observed residue depth profiles for the anti-fungal chitosanase (PDB code:1chk, chain A), as well as the structural mapping of the predicted RD profiles.In Figure 8A, the blue solid line represents the observed RD values, while the red dashed line represents the predicted RD values. In Figure 8B, the sequence regions predicted with different mean absolute errors are colored with a color scale going from red to blue, where red corresponds to the best predicted regions and blue to worst predicted regions. In addition, the active site residues (E22, D40 and T45) are highlighted by the orange sticks, while the functionally important residues involved in chitosan substrate binding (D57, E197 and E201) are represented by dark green sticks [52], [53]. The structural images are prepared using the program PyMOL [82]. For the sake of visualization, structural figures are shown in stereo.
Mentions: We illustrated the performance of the Prodepth predictor by presenting two examples and showed their predicted RD profiles with the structural mapping of the MAE values on the three-dimensional structures in Figure 7 and 8. The first example is the Escherichia coli peptidyl-tRNA hydrolase (PDB code: 2pth, chain A) [51], which is well predicted with a CC of 0.89 and RMSE of 0.93. For the majority regions of this protein, there is a good agreement between the predicted and observed RD values despite that several separate residue positions such as 6, 61, 91, 132 and 134 are poorly predicted (blue), as can be seen from Figure 7A. Interestingly, many of these residues map to the hydrophobic core (Figure S3), however, it is unclear from sequence or structural perspective why these regions are poorly predicted.

Bottom Line: The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally.We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling.This method might prove to be a powerful tool for sequence analysis.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia. Jiangning.Song@med.monash.edu.au

ABSTRACT
Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.

Show MeSH
Related in: MedlinePlus