Limits...
Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.


The conservation of phosphosites.The distributions of sequence conservation for three different groups: (A) mammal group, (B) vertebrate group, and (C) eukaryote group. For each group, the distribution of the positive dataset is significantly different from the distribution of the negative dataset (p-value < 2.2e-16, t-test). (D) The distribution of status conservation. The positive and negative datasets exhibit significant differences in status conservation (p-value = 4.74e-4, chi-squared test). pos: positive dataset; neg: negative dataset; con: conserved phosphosites; no: non-conserved phosphosites.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835696&req=5

f1: The conservation of phosphosites.The distributions of sequence conservation for three different groups: (A) mammal group, (B) vertebrate group, and (C) eukaryote group. For each group, the distribution of the positive dataset is significantly different from the distribution of the negative dataset (p-value < 2.2e-16, t-test). (D) The distribution of status conservation. The positive and negative datasets exhibit significant differences in status conservation (p-value = 4.74e-4, chi-squared test). pos: positive dataset; neg: negative dataset; con: conserved phosphosites; no: non-conserved phosphosites.

Mentions: It was previously reported that phosphosites with known function are more likely to be conserved than those with no characterized function13262728. If some phosphosites were more conserved in evolution, we could infer that these sites have potential function29. In this study, we investigated both the sequence and the status conservation of phosphosites24. The sequence conservation of a phosphosite was evaluated by its evolutionary rate across 14 eukaryotes. In general, the faster the evolutionary rate a phosphosite has, the less conserved it is. Although this method has been widely used to score conservation133031, variation in the evolutionary rate was not taken into account. To avoid this problem, we divided the 14 species into three groups (mammal, vertebrate and eukaryote), which represented close, median and distant time scales in the evolutionary tree, respectively. We calculated the evolutionary rate for each group separately (see Methods). Although the three evolutionary rates differed among the groups (Supplementary Figure 3), they were all significantly lower in the positive dataset than in the negative dataset (mammal: p-value < 2.2e-16; vertebrate: p-value < 2.2e-16; and eukaryote: p-value < 2.2e-16, t-test, Fig. 1A–C), suggesting they could be used for scoring functional phosphosites.


Prioritizing functional phosphorylation sites based on multiple feature integration.

Xiao Q, Miao B, Bi J, Wang Z, Li Y - Sci Rep (2016)

The conservation of phosphosites.The distributions of sequence conservation for three different groups: (A) mammal group, (B) vertebrate group, and (C) eukaryote group. For each group, the distribution of the positive dataset is significantly different from the distribution of the negative dataset (p-value < 2.2e-16, t-test). (D) The distribution of status conservation. The positive and negative datasets exhibit significant differences in status conservation (p-value = 4.74e-4, chi-squared test). pos: positive dataset; neg: negative dataset; con: conserved phosphosites; no: non-conserved phosphosites.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835696&req=5

f1: The conservation of phosphosites.The distributions of sequence conservation for three different groups: (A) mammal group, (B) vertebrate group, and (C) eukaryote group. For each group, the distribution of the positive dataset is significantly different from the distribution of the negative dataset (p-value < 2.2e-16, t-test). (D) The distribution of status conservation. The positive and negative datasets exhibit significant differences in status conservation (p-value = 4.74e-4, chi-squared test). pos: positive dataset; neg: negative dataset; con: conserved phosphosites; no: non-conserved phosphosites.
Mentions: It was previously reported that phosphosites with known function are more likely to be conserved than those with no characterized function13262728. If some phosphosites were more conserved in evolution, we could infer that these sites have potential function29. In this study, we investigated both the sequence and the status conservation of phosphosites24. The sequence conservation of a phosphosite was evaluated by its evolutionary rate across 14 eukaryotes. In general, the faster the evolutionary rate a phosphosite has, the less conserved it is. Although this method has been widely used to score conservation133031, variation in the evolutionary rate was not taken into account. To avoid this problem, we divided the 14 species into three groups (mammal, vertebrate and eukaryote), which represented close, median and distant time scales in the evolutionary tree, respectively. We calculated the evolutionary rate for each group separately (see Methods). Although the three evolutionary rates differed among the groups (Supplementary Figure 3), they were all significantly lower in the positive dataset than in the negative dataset (mammal: p-value < 2.2e-16; vertebrate: p-value < 2.2e-16; and eukaryote: p-value < 2.2e-16, t-test, Fig. 1A–C), suggesting they could be used for scoring functional phosphosites.

Bottom Line: In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques.We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets.We built four different types of classifiers based on the most representative features and found that their performances were similar.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P. R. China.

ABSTRACT
Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/).

No MeSH data available.