Limits...
Sequence based residue depth prediction using evolutionary information and predicted secondary structure.

Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L - BMC Bioinformatics (2008)

Bottom Line: When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus.We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles.The predicted depth can be used to provide improved prediction of both buried and exposed residues.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Mathematical Science and LPMC, Nankai University, Tianjin, PR China. zerohua@gmail.com

ABSTRACT

Background: Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design.

Results: A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509-516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles.

Conclusion: The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations.

Show MeSH

Related in: MedlinePlus

The comparison between RDPred and the predictions based on PSSM and PS features (simulation of YW method). Each point denotes prediction for one sequence in YW923 dataset. Panel (A) compares MAE values, while panel (B) compares PCC values; x-axis shows results for method based on based on PSSM and PS features; y-axis shows results for RDPred method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2567998&req=5

Figure 5: The comparison between RDPred and the predictions based on PSSM and PS features (simulation of YW method). Each point denotes prediction for one sequence in YW923 dataset. Panel (A) compares MAE values, while panel (B) compares PCC values; x-axis shows results for method based on based on PSSM and PS features; y-axis shows results for RDPred method.

Mentions: We also performed a detailed, i.e., based on the predictions derived using three-fold cross validation for individual sequences in YW923 dataset, comparison between RDPred and YW method. Since the individual prediction were not available for YW method, we simulated their prediction by using PSSM and PS features with our SVR model (they used these exact features and SVR to perform predictions). These results allow evaluation of the value added of performing feature selection, adding SS (features based on predicted secondary stricture) and PI (position and information per position) features, and performing the SVR parameterization. Figure 5 shows the relations of MAE and PCC values between the RDPred and the simulation of YW method. In case of MAE, see Figure 5A, we observe that RDPred provides lower errors for majority of the predicted sequences, i.e., for 821 out of 923 proteins the RDPred predictions are below the diagonal which denotes points where both methods obtain equal errors. Similarly, for 646 out of 923 sequences, the RDPred gives higher PCC values; in this case the points are located above the diagonal, see Figure 5B. Furthermore, a paired t-test was applied to investigate statistical significance of these differences. The paired t-test performed at 95% significance level, which compared pairs of MAE values (and pairs of PCC values) for the same sequences predicted by RDPred and the simulation of the YW method, shows that in both cases, i.e., MAE and PCC, the RDPred provided statistically significantly better predictions. The corresponding P-values were smaller than 0.0001 for both PCC and MAE and t-values were equal 12.7 for PCC and 34.1 for MAE.


Sequence based residue depth prediction using evolutionary information and predicted secondary structure.

Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L - BMC Bioinformatics (2008)

The comparison between RDPred and the predictions based on PSSM and PS features (simulation of YW method). Each point denotes prediction for one sequence in YW923 dataset. Panel (A) compares MAE values, while panel (B) compares PCC values; x-axis shows results for method based on based on PSSM and PS features; y-axis shows results for RDPred method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2567998&req=5

Figure 5: The comparison between RDPred and the predictions based on PSSM and PS features (simulation of YW method). Each point denotes prediction for one sequence in YW923 dataset. Panel (A) compares MAE values, while panel (B) compares PCC values; x-axis shows results for method based on based on PSSM and PS features; y-axis shows results for RDPred method.
Mentions: We also performed a detailed, i.e., based on the predictions derived using three-fold cross validation for individual sequences in YW923 dataset, comparison between RDPred and YW method. Since the individual prediction were not available for YW method, we simulated their prediction by using PSSM and PS features with our SVR model (they used these exact features and SVR to perform predictions). These results allow evaluation of the value added of performing feature selection, adding SS (features based on predicted secondary stricture) and PI (position and information per position) features, and performing the SVR parameterization. Figure 5 shows the relations of MAE and PCC values between the RDPred and the simulation of YW method. In case of MAE, see Figure 5A, we observe that RDPred provides lower errors for majority of the predicted sequences, i.e., for 821 out of 923 proteins the RDPred predictions are below the diagonal which denotes points where both methods obtain equal errors. Similarly, for 646 out of 923 sequences, the RDPred gives higher PCC values; in this case the points are located above the diagonal, see Figure 5B. Furthermore, a paired t-test was applied to investigate statistical significance of these differences. The paired t-test performed at 95% significance level, which compared pairs of MAE values (and pairs of PCC values) for the same sequences predicted by RDPred and the simulation of the YW method, shows that in both cases, i.e., MAE and PCC, the RDPred provided statistically significantly better predictions. The corresponding P-values were smaller than 0.0001 for both PCC and MAE and t-values were equal 12.7 for PCC and 34.1 for MAE.

Bottom Line: When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus.We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles.The predicted depth can be used to provide improved prediction of both buried and exposed residues.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Mathematical Science and LPMC, Nankai University, Tianjin, PR China. zerohua@gmail.com

ABSTRACT

Background: Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design.

Results: A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509-516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles.

Conclusion: The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations.

Show MeSH
Related in: MedlinePlus