Limits...
SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties.

Chen YZ, Chen Z, Gong YA, Ying G - PLoS ONE (2012)

Bottom Line: With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset.In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively.In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Cancer Cell Biology, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China.

ABSTRACT
Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/.

Show MeSH
ROC curves of different encoding SVM models using a 10-fold cross-validation.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3375222&req=5

pone-0039195-g002: ROC curves of different encoding SVM models using a 10-fold cross-validation.

Mentions: Because there are always more non-sumoylation sites than sumoylation sites, we repeated the training/testing procedures 5 times by randomly changing the negative samples. When the number of positive and negative data points is different, the MCC should be more suitable for assessing the overall prediction accuracy. To test the stability of the hydrophobic encoding combined with the binary encoding, which was termed “hydrobinary encoding” in this study, we used two strategies on the same dataset: a 10-fold cross-validation and a leave-one-out cross-validation. The prediction performances are shown in Tables 1 and 2, with MCC values as high as 0.682 and 0.690. Because the dataset is highly imbalanced and the MCC can be affected by the tradeoff between sensitivity and specificity, the ROC curves for each strategy were plotted, and the corresponding AUC values were calculated (see Figures 2 and 3). Currently, the SUMOhydro web server is constructed based on the full dataset to facilitate research by the scientific community and is freely available at http://protein.cau.edu.cn/others/SUMOhydro/.


SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties.

Chen YZ, Chen Z, Gong YA, Ying G - PLoS ONE (2012)

ROC curves of different encoding SVM models using a 10-fold cross-validation.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3375222&req=5

pone-0039195-g002: ROC curves of different encoding SVM models using a 10-fold cross-validation.
Mentions: Because there are always more non-sumoylation sites than sumoylation sites, we repeated the training/testing procedures 5 times by randomly changing the negative samples. When the number of positive and negative data points is different, the MCC should be more suitable for assessing the overall prediction accuracy. To test the stability of the hydrophobic encoding combined with the binary encoding, which was termed “hydrobinary encoding” in this study, we used two strategies on the same dataset: a 10-fold cross-validation and a leave-one-out cross-validation. The prediction performances are shown in Tables 1 and 2, with MCC values as high as 0.682 and 0.690. Because the dataset is highly imbalanced and the MCC can be affected by the tradeoff between sensitivity and specificity, the ROC curves for each strategy were plotted, and the corresponding AUC values were calculated (see Figures 2 and 3). Currently, the SUMOhydro web server is constructed based on the full dataset to facilitate research by the scientific community and is freely available at http://protein.cau.edu.cn/others/SUMOhydro/.

Bottom Line: With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset.In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively.In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Cancer Cell Biology, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China.

ABSTRACT
Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/.

Show MeSH