Limits...
A grammar inference approach for predicting kinase specific phosphorylation sites.

Datta S, Mukhopadhyay S - PLoS ONE (2015)

Bottom Line: Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner.It performs significantly better when compared with the other existing phosphorylation site prediction methods.We have also compared our inferred DSFA with two other GI inference algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Biophysics, Molecular Biology and Bioinformatics and Distributed Information Centre for Bioinformatics, University of Calcutta, Kolkata, West Bengal, India.

ABSTRACT
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

Show MeSH

Related in: MedlinePlus

Performance comparison of six methods along with our proposed method in terms of precision, recall, accuracy, F-measure for the four types of kinases: (a)PKA, (b)PKC, (c)MAPK and (d)CK2.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401752&req=5

pone.0122294.g007: Performance comparison of six methods along with our proposed method in terms of precision, recall, accuracy, F-measure for the four types of kinases: (a)PKA, (b)PKC, (c)MAPK and (d)CK2.

Mentions: To avoid biased prediction, we have considered a candidate sequence to be true positive only when the sequence is predicted correctly. We have used the sequences in the 40% test dataset, i.e. PHSDB dataset taken from the PhosPho.ELM database version 9.0 for comparison. The advantage of using the PHSDB dataset is its non-biasness and independence; thereby we can fairly compare several existing methods with our proposed method. The performance of comparison is assessed on the basis of the parameters precision, recall, accuracy and F-measure. Fig 7A–7D shows the comparisons of predictive performance of our method with the six other prediction systems for the Kinase PKA, PKC, MAPK and CK2 respectively. From the Fig 7 we observed that our method outperforms all the methods in terms of Precision, Accuracy and F-measure for all the kinases. For kinase CK2, KinasePhos2.0 yields the highest recall value followed by GPS2.1 and for all the other kinases (PKA, PKC and MAPK) GPS2.1 yields superior recall values. NetPhosK performs worst in terms of recall for all the kinases. PPSP performs better in terms of recall but yield a lower precision value. GPS2.1 performs worst in terms of precision for PKA, MAPK and CK2. NetPhosK obtains lowest precision for PKC. Fig 7A–7D shows that for Kinase PKA, PKC, MAPK and CK2, the six methods achieve a good recall value but sacrificing the precision result in a low F-measure. Also lower precision value implies a higher number of false positives. GPS 2.1 and PPSP obtain a very high recall value but at the same time a very low precision value which means these two methods have yielded a large number of false positives. For PKA and CK2, GPS2.1 performs worst in terms of accuracy and F-measure whereas NetPhosK lowest accuracy and F-measure for PKC and KinasePhos 2.0 for MAPK. Our method offers high and balanced precision as well as recall, which reflects that our method is superior to the well known existing phosphorylation site prediction methods and can effectively distinguish the phosphorylation sites from non-phosphorylation sites in a kinase specific manner. Moreover, the good performances of our method illustrates that our method can efficiently evaluate the sequence similarity of phosphorylation substrates for different kinases.


A grammar inference approach for predicting kinase specific phosphorylation sites.

Datta S, Mukhopadhyay S - PLoS ONE (2015)

Performance comparison of six methods along with our proposed method in terms of precision, recall, accuracy, F-measure for the four types of kinases: (a)PKA, (b)PKC, (c)MAPK and (d)CK2.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401752&req=5

pone.0122294.g007: Performance comparison of six methods along with our proposed method in terms of precision, recall, accuracy, F-measure for the four types of kinases: (a)PKA, (b)PKC, (c)MAPK and (d)CK2.
Mentions: To avoid biased prediction, we have considered a candidate sequence to be true positive only when the sequence is predicted correctly. We have used the sequences in the 40% test dataset, i.e. PHSDB dataset taken from the PhosPho.ELM database version 9.0 for comparison. The advantage of using the PHSDB dataset is its non-biasness and independence; thereby we can fairly compare several existing methods with our proposed method. The performance of comparison is assessed on the basis of the parameters precision, recall, accuracy and F-measure. Fig 7A–7D shows the comparisons of predictive performance of our method with the six other prediction systems for the Kinase PKA, PKC, MAPK and CK2 respectively. From the Fig 7 we observed that our method outperforms all the methods in terms of Precision, Accuracy and F-measure for all the kinases. For kinase CK2, KinasePhos2.0 yields the highest recall value followed by GPS2.1 and for all the other kinases (PKA, PKC and MAPK) GPS2.1 yields superior recall values. NetPhosK performs worst in terms of recall for all the kinases. PPSP performs better in terms of recall but yield a lower precision value. GPS2.1 performs worst in terms of precision for PKA, MAPK and CK2. NetPhosK obtains lowest precision for PKC. Fig 7A–7D shows that for Kinase PKA, PKC, MAPK and CK2, the six methods achieve a good recall value but sacrificing the precision result in a low F-measure. Also lower precision value implies a higher number of false positives. GPS 2.1 and PPSP obtain a very high recall value but at the same time a very low precision value which means these two methods have yielded a large number of false positives. For PKA and CK2, GPS2.1 performs worst in terms of accuracy and F-measure whereas NetPhosK lowest accuracy and F-measure for PKC and KinasePhos 2.0 for MAPK. Our method offers high and balanced precision as well as recall, which reflects that our method is superior to the well known existing phosphorylation site prediction methods and can effectively distinguish the phosphorylation sites from non-phosphorylation sites in a kinase specific manner. Moreover, the good performances of our method illustrates that our method can efficiently evaluate the sequence similarity of phosphorylation substrates for different kinases.

Bottom Line: Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner.It performs significantly better when compared with the other existing phosphorylation site prediction methods.We have also compared our inferred DSFA with two other GI inference algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Biophysics, Molecular Biology and Bioinformatics and Distributed Information Centre for Bioinformatics, University of Calcutta, Kolkata, West Bengal, India.

ABSTRACT
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

Show MeSH
Related in: MedlinePlus