Limits...
Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.


The distribution of structural features in the positive and negative datasets.(A) Disorder score distribution (p-value < 2.2e-16, Wilcoxon rank sum test). (B) Secondary structure distribution (p-value = 2.29e-6, chi-squared test).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835696&req=5

f2: The distribution of structural features in the positive and negative datasets.(A) Disorder score distribution (p-value < 2.2e-16, Wilcoxon rank sum test). (B) Secondary structure distribution (p-value = 2.29e-6, chi-squared test).

Mentions: To evaluate the predictive ability of each feature, we compared their distributions in the positive and negative datasets. In both datasets, the majority of phosphosites are located in the disordered region (Fig. 2A). However, more sites in the positive dataset have relatively lower disorder scores, indicating that more positive sites are located in ordered regions compared to negative sites. The score distributions in the two datasets are significantly different (p-value < 2.2e-16, Wilcoxon rank sum test). This result is consistent with our expectation that background phosphosites occur more frequently in disordered regions, where more off-target events could arise.


Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

The distribution of structural features in the positive and negative datasets.(A) Disorder score distribution (p-value < 2.2e-16, Wilcoxon rank sum test). (B) Secondary structure distribution (p-value = 2.29e-6, chi-squared test).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835696&req=5

f2: The distribution of structural features in the positive and negative datasets.(A) Disorder score distribution (p-value < 2.2e-16, Wilcoxon rank sum test). (B) Secondary structure distribution (p-value = 2.29e-6, chi-squared test).
Mentions: To evaluate the predictive ability of each feature, we compared their distributions in the positive and negative datasets. In both datasets, the majority of phosphosites are located in the disordered region (Fig. 2A). However, more sites in the positive dataset have relatively lower disorder scores, indicating that more positive sites are located in ordered regions compared to negative sites. The score distributions in the two datasets are significantly different (p-value < 2.2e-16, Wilcoxon rank sum test). This result is consistent with our expectation that background phosphosites occur more frequently in disordered regions, where more off-target events could arise.

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.