Limits...
Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.


NetworkIN score distribution.Violin plot for NetworkIN score with log-scaled x-axis. The figure shows the significant difference between positive and negative sites (p-value < 2.2e-16, Wilcoxon rank sum test).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835696&req=5

f3: NetworkIN score distribution.Violin plot for NetworkIN score with log-scaled x-axis. The figure shows the significant difference between positive and negative sites (p-value < 2.2e-16, Wilcoxon rank sum test).

Mentions: Due to the off-target activity of kinases, we hypothesized that randomly phosphorylated sites tended to have weaker associations with kinases13. If a phosphosite is functional, it is more likely to match the kinase motif perfectly. Therefore, measuring the association between kinases and phosphosites may help identify those sites with critical functions. NetworkIN is a tool that integrates both protein-protein interaction and phylogenetic tree information to predict kinase recognition, and it returns a NetworkIN score for each site in a probabilistic manner40. A larger NetworkIN score indicates a higher possibility of kinase recognition. We compared the distribution of NetworkIN scores in the two datasets (see Methods). As shown in Fig. 3, positive sites have significantly higher NetworkIN scores than negative sites (p-value < 2.2e-16, Wilcoxon rank sum test). This result indicates that kinase association is a good feature to distinguish positive sites from negative ones.


Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

NetworkIN score distribution.Violin plot for NetworkIN score with log-scaled x-axis. The figure shows the significant difference between positive and negative sites (p-value < 2.2e-16, Wilcoxon rank sum test).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835696&req=5

f3: NetworkIN score distribution.Violin plot for NetworkIN score with log-scaled x-axis. The figure shows the significant difference between positive and negative sites (p-value < 2.2e-16, Wilcoxon rank sum test).
Mentions: Due to the off-target activity of kinases, we hypothesized that randomly phosphorylated sites tended to have weaker associations with kinases13. If a phosphosite is functional, it is more likely to match the kinase motif perfectly. Therefore, measuring the association between kinases and phosphosites may help identify those sites with critical functions. NetworkIN is a tool that integrates both protein-protein interaction and phylogenetic tree information to predict kinase recognition, and it returns a NetworkIN score for each site in a probabilistic manner40. A larger NetworkIN score indicates a higher possibility of kinase recognition. We compared the distribution of NetworkIN scores in the two datasets (see Methods). As shown in Fig. 3, positive sites have significantly higher NetworkIN scores than negative sites (p-value < 2.2e-16, Wilcoxon rank sum test). This result indicates that kinase association is a good feature to distinguish positive sites from negative ones.

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.