Limits...
Prediction of 492 human protein kinase substrate specificities.

Safaei J, Maňuch J, Gupta A, Stacho L, Pelech S - Proteome Sci (2011)

Bottom Line: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses.This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction.Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, Canada. jsafaei@cs.ubc.ca.

ABSTRACT

Background: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase.

Results: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

Conclusions: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

No MeSH data available.


Related in: MedlinePlus

Computing correlation charge dependency in profile matrix based module of the predictor. In the left part of the figure the aligned catalytic domains of the 302 training kinases is shown, and on the right hand side for each kinase the profile matrix is drawn. It is clear that the same columns in all the kinase profile matrices create only one random variable, where its correlation to the aligned catalytic domain should be studied.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3379035&req=5

Figure 4: Computing correlation charge dependency in profile matrix based module of the predictor. In the left part of the figure the aligned catalytic domains of the 302 training kinases is shown, and on the right hand side for each kinase the profile matrix is drawn. It is clear that the same columns in all the kinase profile matrices create only one random variable, where its correlation to the aligned catalytic domain should be studied.

Mentions: kinase in Algorithm 1 without determination of their consensus sequence. With this strategy we use more information and it it might allow for better prediction, while on the other hand it may lead to overfitted results. In Section , we will test both of these algorithms (1. consensus based and 2. profile based), and compare the results. In profile matrix based method, the main difficulty is that for the random variable Yj (column j in the aligned consensus sequences) we do not have the correlated values of the random variable Xi (column i in the aligned catalytic domains). Instead, for each value in Xi we have 21 different amino acid probabilities of Yj. Assume ak,i is the amino acid in the aligned catalytic domain of the kinase k, also let be the probability of the lth amino acid (1 ≤ l ≤ 21) at position j (1 ≤ j ≤ 15) of the profile matrix of kinase k. Figure 4 represents these notations in a visual manner. Before computing the charge dependency correlation of two columns (or random variables) Xi and Yi we compute the probability of amino acids in each random variable. P(Xi = x) is computed by maximum likelihood estimation using ak,i amino acids as follows:(6)


Prediction of 492 human protein kinase substrate specificities.

Safaei J, Maňuch J, Gupta A, Stacho L, Pelech S - Proteome Sci (2011)

Computing correlation charge dependency in profile matrix based module of the predictor. In the left part of the figure the aligned catalytic domains of the 302 training kinases is shown, and on the right hand side for each kinase the profile matrix is drawn. It is clear that the same columns in all the kinase profile matrices create only one random variable, where its correlation to the aligned catalytic domain should be studied.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3379035&req=5

Figure 4: Computing correlation charge dependency in profile matrix based module of the predictor. In the left part of the figure the aligned catalytic domains of the 302 training kinases is shown, and on the right hand side for each kinase the profile matrix is drawn. It is clear that the same columns in all the kinase profile matrices create only one random variable, where its correlation to the aligned catalytic domain should be studied.
Mentions: kinase in Algorithm 1 without determination of their consensus sequence. With this strategy we use more information and it it might allow for better prediction, while on the other hand it may lead to overfitted results. In Section , we will test both of these algorithms (1. consensus based and 2. profile based), and compare the results. In profile matrix based method, the main difficulty is that for the random variable Yj (column j in the aligned consensus sequences) we do not have the correlated values of the random variable Xi (column i in the aligned catalytic domains). Instead, for each value in Xi we have 21 different amino acid probabilities of Yj. Assume ak,i is the amino acid in the aligned catalytic domain of the kinase k, also let be the probability of the lth amino acid (1 ≤ l ≤ 21) at position j (1 ≤ j ≤ 15) of the profile matrix of kinase k. Figure 4 represents these notations in a visual manner. Before computing the charge dependency correlation of two columns (or random variables) Xi and Yi we compute the probability of amino acids in each random variable. P(Xi = x) is computed by maximum likelihood estimation using ak,i amino acids as follows:(6)

Bottom Line: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses.This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction.Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, Canada. jsafaei@cs.ubc.ca.

ABSTRACT

Background: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase.

Results: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

Conclusions: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

No MeSH data available.


Related in: MedlinePlus