Limits...
A grammar inference approach for predicting kinase specific phosphorylation sites.

Datta S, Mukhopadhyay S - PLoS ONE (2015)

Bottom Line: Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner.It performs significantly better when compared with the other existing phosphorylation site prediction methods.We have also compared our inferred DSFA with two other GI inference algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Biophysics, Molecular Biology and Bioinformatics and Distributed Information Centre for Bioinformatics, University of Calcutta, Kolkata, West Bengal, India.

ABSTRACT
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

Show MeSH

Related in: MedlinePlus

Block diagram of a general Grammar Inference methodology.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401752&req=5

pone.0122294.g001: Block diagram of a general Grammar Inference methodology.

Mentions: In our study, we have proposed a new method to support the de novo discovery of kinase specific phosphorylation sites based upon computational grammar. Various methods based on Computational grammars have been proposed so far for modelling and predicting various types of biological sequences such as promoter region of human [23], transcription binding site [24], associating genes with their regulatory sequences [25], predicting RNA folding [26], secondary structure of RNA molecule [27–29], genes and biological sequences [30,31], syntactic model to design genetic constructs [32] and new antimicrobial peptides [33]. Nowadays, Grammar Inference (GI) is becoming an active field of research in the area of computational grammar [34]. GI is a specific type of inductive inference, which takes into account a set of sample data to obtain a model consistent with the available data. The resulting model obtained through GI method using a set of sample strings represents a formal grammar which contains all common features of the strings. GI method is being used for predicting various biological sequences, such as larger than gene structure [35], functional motifs, such as coiled coil domains in protein sequences [36], transmembrane domains [37], various protein sequences [38], etc. The choice of GI as a method confers us with two distinct advantages over other methods having a different flavor: (a) GI doesn’t require any pre-existing biological knowledge and (b) no data encoding is required for a GI. A general framework of GI methodology is shown in Fig 1. In our study, we have used GI method to infer the regular grammar corresponding to the phosphorylation sites of protein sequences. From previous studies we came to know that substrates of a specific kinase show a particular sequences pattern around the phosphorylation sites [39, 40]. For example, PKA kinase has a preference to identify the substrate sites with basic amino acids (Arginine, Lysine or Histidine) at -2 or -3 positions relative to the phosphorylation sites considered as position 0 [40]. Many methods have been developed to infer the sequence motifs near the phosphorylation sites [7, 39, 40, 41]. Our method is based on the idea of identifying recurring patterns hidden in the phosphorylation sites from a set of training samples and representing them as grammatical models. The generated grammar is then used for predicting unknown phosphorylation sites. A grammar can equally be represented by means of an automaton. Instead of inferring grammar rules, in our present study, we have inferred the Deterministic Stochastic Finite State Automata (DSFA) using Alergia algorithm for predicting the kinase specific phosphorylation sites. The method we have presented here is an unsupervised learning method based on Prefix Tree Automaton (PTA). At the very outset, our method first constructs a PTA based on the training samples. In the next step, the similar states of the PTA are merged in an iterative process to obtain a stochastic finite state automaton. The resulting automata will allow processing of unknown protein sequences for deciding upon whether to accept or to reject the sequences. The point to be noted here is that it can accept a wider range of input sequences than what may be present in the training set.


A grammar inference approach for predicting kinase specific phosphorylation sites.

Datta S, Mukhopadhyay S - PLoS ONE (2015)

Block diagram of a general Grammar Inference methodology.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401752&req=5

pone.0122294.g001: Block diagram of a general Grammar Inference methodology.
Mentions: In our study, we have proposed a new method to support the de novo discovery of kinase specific phosphorylation sites based upon computational grammar. Various methods based on Computational grammars have been proposed so far for modelling and predicting various types of biological sequences such as promoter region of human [23], transcription binding site [24], associating genes with their regulatory sequences [25], predicting RNA folding [26], secondary structure of RNA molecule [27–29], genes and biological sequences [30,31], syntactic model to design genetic constructs [32] and new antimicrobial peptides [33]. Nowadays, Grammar Inference (GI) is becoming an active field of research in the area of computational grammar [34]. GI is a specific type of inductive inference, which takes into account a set of sample data to obtain a model consistent with the available data. The resulting model obtained through GI method using a set of sample strings represents a formal grammar which contains all common features of the strings. GI method is being used for predicting various biological sequences, such as larger than gene structure [35], functional motifs, such as coiled coil domains in protein sequences [36], transmembrane domains [37], various protein sequences [38], etc. The choice of GI as a method confers us with two distinct advantages over other methods having a different flavor: (a) GI doesn’t require any pre-existing biological knowledge and (b) no data encoding is required for a GI. A general framework of GI methodology is shown in Fig 1. In our study, we have used GI method to infer the regular grammar corresponding to the phosphorylation sites of protein sequences. From previous studies we came to know that substrates of a specific kinase show a particular sequences pattern around the phosphorylation sites [39, 40]. For example, PKA kinase has a preference to identify the substrate sites with basic amino acids (Arginine, Lysine or Histidine) at -2 or -3 positions relative to the phosphorylation sites considered as position 0 [40]. Many methods have been developed to infer the sequence motifs near the phosphorylation sites [7, 39, 40, 41]. Our method is based on the idea of identifying recurring patterns hidden in the phosphorylation sites from a set of training samples and representing them as grammatical models. The generated grammar is then used for predicting unknown phosphorylation sites. A grammar can equally be represented by means of an automaton. Instead of inferring grammar rules, in our present study, we have inferred the Deterministic Stochastic Finite State Automata (DSFA) using Alergia algorithm for predicting the kinase specific phosphorylation sites. The method we have presented here is an unsupervised learning method based on Prefix Tree Automaton (PTA). At the very outset, our method first constructs a PTA based on the training samples. In the next step, the similar states of the PTA are merged in an iterative process to obtain a stochastic finite state automaton. The resulting automata will allow processing of unknown protein sequences for deciding upon whether to accept or to reject the sequences. The point to be noted here is that it can accept a wider range of input sequences than what may be present in the training set.

Bottom Line: Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner.It performs significantly better when compared with the other existing phosphorylation site prediction methods.We have also compared our inferred DSFA with two other GI inference algorithms.

View Article: PubMed Central - PubMed

Affiliation: Department of Biophysics, Molecular Biology and Bioinformatics and Distributed Information Centre for Bioinformatics, University of Calcutta, Kolkata, West Bengal, India.

ABSTRACT
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

Show MeSH
Related in: MedlinePlus