Limits...
Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.


Different model performance on the independent test set.(A) The ROC curve for different models. Different models yield comparable results. (B) Sensitivity, specificity, accuracy, precision, F-measure and the area under the ROC curve for five models. Sn: sensitivity; Sp: specificity; ACC: accuracy; PPV: precision; F1: F-measure; and ROC: the area under the ROC curve.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835696&req=5

f4: Different model performance on the independent test set.(A) The ROC curve for different models. Different models yield comparable results. (B) Sensitivity, specificity, accuracy, precision, F-measure and the area under the ROC curve for five models. Sn: sensitivity; Sp: specificity; ACC: accuracy; PPV: precision; F1: F-measure; and ROC: the area under the ROC curve.

Mentions: We trained four different types of machine learning models, including Bayesnet, Logistic regression, Random Forest and Multilayer Perceptron. We split the full datasets into 90% for the training set and 10% for the independent test set. With the 10-fold cross-validation method performed on the training set (see Methods), we compared the performance of the four models (Supplementary Figure 4). The results indicate that the performances of the constructed models are similar and robust. Figure 4A shows the ROC curve of the independent testing datasets for different models, and different models yield comparable results, which indicate that the prediction is independent of the model chosen. The detailed performance metrics of the independent test set including sensitivity, specificity, precision, F-measure, accuracy and the area under the ROC curve for the four models are compared in Fig. 4B. The performances on the test dataset are similar to the cross-validation results on the training dataset, indicating no overfitting of the model.


Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

Different model performance on the independent test set.(A) The ROC curve for different models. Different models yield comparable results. (B) Sensitivity, specificity, accuracy, precision, F-measure and the area under the ROC curve for five models. Sn: sensitivity; Sp: specificity; ACC: accuracy; PPV: precision; F1: F-measure; and ROC: the area under the ROC curve.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835696&req=5

f4: Different model performance on the independent test set.(A) The ROC curve for different models. Different models yield comparable results. (B) Sensitivity, specificity, accuracy, precision, F-measure and the area under the ROC curve for five models. Sn: sensitivity; Sp: specificity; ACC: accuracy; PPV: precision; F1: F-measure; and ROC: the area under the ROC curve.
Mentions: We trained four different types of machine learning models, including Bayesnet, Logistic regression, Random Forest and Multilayer Perceptron. We split the full datasets into 90% for the training set and 10% for the independent test set. With the 10-fold cross-validation method performed on the training set (see Methods), we compared the performance of the four models (Supplementary Figure 4). The results indicate that the performances of the constructed models are similar and robust. Figure 4A shows the ROC curve of the independent testing datasets for different models, and different models yield comparable results, which indicate that the prediction is independent of the model chosen. The detailed performance metrics of the independent test set including sensitivity, specificity, precision, F-measure, accuracy and the area under the ROC curve for the four models are compared in Fig. 4B. The performances on the test dataset are similar to the cross-validation results on the training dataset, indicating no overfitting of the model.

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.