Limits...
Prediction of 492 human protein kinase substrate specificities.

Safaei J, Maňuch J, Gupta A, Stacho L, Pelech S - Proteome Sci (2011)

Bottom Line: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses.This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction.Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, Canada. jsafaei@cs.ubc.ca.

ABSTRACT

Background: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase.

Results: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

Conclusions: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

No MeSH data available.


Related in: MedlinePlus

Confusion matrices for two modules. The figure includes two tables and each table represents the classification power or consensus module (on the left) and profile matrix module of the predictor (on the right). In each table confusion matrix is represented by true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Also accuracy (AC), sensitivity (SE), and specificity (SP) metrics are computed based upon the confusion matrices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3379035&req=5

Figure 7: Confusion matrices for two modules. The figure includes two tables and each table represents the classification power or consensus module (on the left) and profile matrix module of the predictor (on the right). In each table confusion matrix is represented by true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Also accuracy (AC), sensitivity (SE), and specificity (SP) metrics are computed based upon the confusion matrices.

Mentions: In this simulation, we measured the accuracy of each predicted kinase substrate specificity for those kinases in the test set. For this, we determined classifiers for each kinase and prepared positive and negative instances for each classifier. We used PSSM matrices of each kinase as a classifier and took the confirmed substrate peptides of each kinase in the test set as positive instances. For negative instances, unlike previous attempts [10,11], we randomly generated negative instances for each kinase in the test set equal to the number of its positive instances using a uniform distribution. The reason for this is that if we choose those substrate peptides that are not experimentally confirmed but are in the substrate protein as negative instances, it is probable that in future studies (e.g. From mass spectrometry analyses) they may later prove to be positive instances. Afterward, for any given substrate phospho-peptide, we computed the score of the PSSM matrix as in Equation (3) and if the score was less than zero it was declared that the substrate phospho-peptide was negative for the kinase in question. Otherwise, we accepted the given substrate phospho-peptide as a candidate peptide phosphorylatable by the kinase. Figure 5 is also helpful for showing the flow of the data for preparing positive and negative substrate phospho-peptides for the top five test kinases. For all the kinases in the test set similar results were computed, and the classifiers were successful in identifying most of the negative instances (low rate of false positives), but they were apparently much less efficient for identifying all the positive instances (high rate of false negatives). Approximately 77% accuracy, 60% sensitivity and 95% specificity values were computed for all the classifiers in the test set. Figure 7 represents the exact confusion matrix, accuracy, sensitivity and specificity values for each kinase in the test set for both the consensus and profile matrix based sub-modules of our predictor. It was observed that the sensitivity for all classifiers in the consensus based method was low, while this disadvantage was eliminated in the profile matrix based method with 10% higher accuracy compared to the consensus method.


Prediction of 492 human protein kinase substrate specificities.

Safaei J, Maňuch J, Gupta A, Stacho L, Pelech S - Proteome Sci (2011)

Confusion matrices for two modules. The figure includes two tables and each table represents the classification power or consensus module (on the left) and profile matrix module of the predictor (on the right). In each table confusion matrix is represented by true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Also accuracy (AC), sensitivity (SE), and specificity (SP) metrics are computed based upon the confusion matrices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3379035&req=5

Figure 7: Confusion matrices for two modules. The figure includes two tables and each table represents the classification power or consensus module (on the left) and profile matrix module of the predictor (on the right). In each table confusion matrix is represented by true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Also accuracy (AC), sensitivity (SE), and specificity (SP) metrics are computed based upon the confusion matrices.
Mentions: In this simulation, we measured the accuracy of each predicted kinase substrate specificity for those kinases in the test set. For this, we determined classifiers for each kinase and prepared positive and negative instances for each classifier. We used PSSM matrices of each kinase as a classifier and took the confirmed substrate peptides of each kinase in the test set as positive instances. For negative instances, unlike previous attempts [10,11], we randomly generated negative instances for each kinase in the test set equal to the number of its positive instances using a uniform distribution. The reason for this is that if we choose those substrate peptides that are not experimentally confirmed but are in the substrate protein as negative instances, it is probable that in future studies (e.g. From mass spectrometry analyses) they may later prove to be positive instances. Afterward, for any given substrate phospho-peptide, we computed the score of the PSSM matrix as in Equation (3) and if the score was less than zero it was declared that the substrate phospho-peptide was negative for the kinase in question. Otherwise, we accepted the given substrate phospho-peptide as a candidate peptide phosphorylatable by the kinase. Figure 5 is also helpful for showing the flow of the data for preparing positive and negative substrate phospho-peptides for the top five test kinases. For all the kinases in the test set similar results were computed, and the classifiers were successful in identifying most of the negative instances (low rate of false positives), but they were apparently much less efficient for identifying all the positive instances (high rate of false negatives). Approximately 77% accuracy, 60% sensitivity and 95% specificity values were computed for all the classifiers in the test set. Figure 7 represents the exact confusion matrix, accuracy, sensitivity and specificity values for each kinase in the test set for both the consensus and profile matrix based sub-modules of our predictor. It was observed that the sensitivity for all classifiers in the consensus based method was low, while this disadvantage was eliminated in the profile matrix based method with 10% higher accuracy compared to the consensus method.

Bottom Line: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses.This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction.Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of British Columbia, Vancouver, Canada. jsafaei@cs.ubc.ca.

ABSTRACT

Background: Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase.

Results: The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates.

Conclusions: Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the open source PhosphoNET website (http://www.phosphonet.ca).

No MeSH data available.


Related in: MedlinePlus