Limits...
Prediction of 492 human protein kinase substrate specificities.

Safaei J, Maňuch J, Gupta A, Stacho L, Pelech S - Proteome Sci (2011)

Bottom Line: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses.This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction.Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, Canada. jsafaei@cs.ubc.ca.

ABSTRACT

Background: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase.

Results: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

Conclusions: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

No MeSH data available.


Related in: MedlinePlus

Data and process flow of the experiments. The figure shows the order of creating the datasets for the computational simulations and comparison of our predictor results with current state of the art methods such as NetPhorest and NetworKIN.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3379035&req=5

Figure 5: Data and process flow of the experiments. The figure shows the order of creating the datasets for the computational simulations and comparison of our predictor results with current state of the art methods such as NetPhorest and NetworKIN.

Mentions: To evaluate these predicted matrices, we also computed the profile matrices of 302 kinases in the training set computed by the method described in Section (empirical matrices), and the results were compared using sum of squared differences. Figure 5 illustrates how we set up this comparison, and Figure 6 shows the distribution of these errors. This figure presents the results for the profile matrix based module of the predictor as well. It is evident that the majority of the predicted matrices are extremely similar to those generated by known substrate alignments. Interestingly, the results on the test set are more accurate (with sum of squared error less than 1) than the predicted results on the training set (which can be up to 10 to 15) for both modules. The reason is that for each kinase in the test set there are more experimental substrate peptides, and as a result empirically computed matrices are closer to the correct specificity of each kinase. However, in the training set there are many kinases with less than 20 – 30 experimentally confirmed substrate peptides and we may expect their empirically computed matrices are not close the correct profile of each kinase. The profile matrix based module used more information than the consensus module, and not only does it not overfit on the data, but also it has more accuracy with both test (with a total of 1.99 sum of squared error (SSE) for all five kinases) and training set (with a total of 494.22 SSE). As evident in Figure 6, the consensus based method had higher errors with a total of 584.22 SSE for all kinases in the training set and a total of 2.66 SSE for all kinases in the test set.


Prediction of 492 human protein kinase substrate specificities.

Safaei J, Maňuch J, Gupta A, Stacho L, Pelech S - Proteome Sci (2011)

Data and process flow of the experiments. The figure shows the order of creating the datasets for the computational simulations and comparison of our predictor results with current state of the art methods such as NetPhorest and NetworKIN.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3379035&req=5

Figure 5: Data and process flow of the experiments. The figure shows the order of creating the datasets for the computational simulations and comparison of our predictor results with current state of the art methods such as NetPhorest and NetworKIN.
Mentions: To evaluate these predicted matrices, we also computed the profile matrices of 302 kinases in the training set computed by the method described in Section (empirical matrices), and the results were compared using sum of squared differences. Figure 5 illustrates how we set up this comparison, and Figure 6 shows the distribution of these errors. This figure presents the results for the profile matrix based module of the predictor as well. It is evident that the majority of the predicted matrices are extremely similar to those generated by known substrate alignments. Interestingly, the results on the test set are more accurate (with sum of squared error less than 1) than the predicted results on the training set (which can be up to 10 to 15) for both modules. The reason is that for each kinase in the test set there are more experimental substrate peptides, and as a result empirically computed matrices are closer to the correct specificity of each kinase. However, in the training set there are many kinases with less than 20 – 30 experimentally confirmed substrate peptides and we may expect their empirically computed matrices are not close the correct profile of each kinase. The profile matrix based module used more information than the consensus module, and not only does it not overfit on the data, but also it has more accuracy with both test (with a total of 1.99 sum of squared error (SSE) for all five kinases) and training set (with a total of 494.22 SSE). As evident in Figure 6, the consensus based method had higher errors with a total of 584.22 SSE for all kinases in the training set and a total of 2.66 SSE for all kinases in the test set.

Bottom Line: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses.This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction.Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, Canada. jsafaei@cs.ubc.ca.

ABSTRACT

Background: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase.

Results: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

Conclusions: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

No MeSH data available.


Related in: MedlinePlus