Limits...
Identification of ATP binding residues of a protein from its primary sequence.

Chauhan JS, Mishra NK, Raghava GP - BMC Bioinformatics (2009)

Bottom Line: We have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP.The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Microbial Technology, Chandigarh, India. jagat@imtech.res.in

ABSTRACT

Background: One of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction.

Results: We have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP. This study describes few models that have been developed for identifying ATP interacting residues in a protein. All these models were trained and tested on 168 non-redundant ABPs chains. First we have developed a Support Vector Machine (SVM) based model using primary sequence of proteins and obtained maximum MCC 0.33 with accuracy of 66.25%. Secondly, another SVM based model was developed using position specific scoring matrix (PSSM) generated by PSI-BLAST. The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.

Conclusion: This study demonstrates that it is possible to predict 'ATP interacting residues' in a protein with moderate accuracy using its sequence. The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence. This method will be useful for researchers studying ATP-binding proteins. Based on this study, a web server has been developed for predicting 'ATP interacting residues' in a protein http://www.imtech.res.in/raghava/atpint/.

Show MeSH
ROC plot shows performance of SVM modules developed using amino acid sequence and PSSM profile.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2803200&req=5

Figure 2: ROC plot shows performance of SVM modules developed using amino acid sequence and PSSM profile.

Mentions: It has been shown in previous studies on nucleotide interacting proteins that they perform best for 17-window size (pattern length) [16,9]. Thus we have used pattern length 17 for developing our prediction model. All possible overlapping peptides of 17 amino acids were generated from ATP binding proteins/chains, a peptide/pattern is assigned ATP interacting or positive if the residue at its center is ATP interacting otherwise it was assigned as negative. After classifying them as positive and negative patterns, they were converted into binary patterns. The peptide of length N was represented by a vector of dimensions N × 21, where each residue is represented by a vector of dimension 21 (e.g. Ala by 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; Cys by 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0); contains 20 amino acids and one dummy amino acid "X". Our SVM module predict a score for each residue in protein (in range of -1.0 to 1.0), we define a threshold to discriminate ATP interacting and non-interacting residues. The performance of SVM module developed using a single sequence for window size 17 is shown in Table 1. We have also tried various window sizes from 7 to 25 residues and observed that 17 window size patterns gave better performance (Table 2). We have achieved 66.25% accuracy with minimum difference between sensitivity and specificity and MCC 0.33 by 17 window patterns (Table 1) at threshold 0.0. Normally we select a threshold where sensitivity and specificity are nearly equal, in order to make the balance between sensitivity and specificity. The performance of SVM model for window size 17 using single sequence is shown in Figure 2. We have achieved AUC 0.725 which was significantly better than random (AUC 0.5).


Identification of ATP binding residues of a protein from its primary sequence.

Chauhan JS, Mishra NK, Raghava GP - BMC Bioinformatics (2009)

ROC plot shows performance of SVM modules developed using amino acid sequence and PSSM profile.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2803200&req=5

Figure 2: ROC plot shows performance of SVM modules developed using amino acid sequence and PSSM profile.
Mentions: It has been shown in previous studies on nucleotide interacting proteins that they perform best for 17-window size (pattern length) [16,9]. Thus we have used pattern length 17 for developing our prediction model. All possible overlapping peptides of 17 amino acids were generated from ATP binding proteins/chains, a peptide/pattern is assigned ATP interacting or positive if the residue at its center is ATP interacting otherwise it was assigned as negative. After classifying them as positive and negative patterns, they were converted into binary patterns. The peptide of length N was represented by a vector of dimensions N × 21, where each residue is represented by a vector of dimension 21 (e.g. Ala by 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; Cys by 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0); contains 20 amino acids and one dummy amino acid "X". Our SVM module predict a score for each residue in protein (in range of -1.0 to 1.0), we define a threshold to discriminate ATP interacting and non-interacting residues. The performance of SVM module developed using a single sequence for window size 17 is shown in Table 1. We have also tried various window sizes from 7 to 25 residues and observed that 17 window size patterns gave better performance (Table 2). We have achieved 66.25% accuracy with minimum difference between sensitivity and specificity and MCC 0.33 by 17 window patterns (Table 1) at threshold 0.0. Normally we select a threshold where sensitivity and specificity are nearly equal, in order to make the balance between sensitivity and specificity. The performance of SVM model for window size 17 using single sequence is shown in Figure 2. We have achieved AUC 0.725 which was significantly better than random (AUC 0.5).

Bottom Line: We have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP.The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Microbial Technology, Chandigarh, India. jagat@imtech.res.in

ABSTRACT

Background: One of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction.

Results: We have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP. This study describes few models that have been developed for identifying ATP interacting residues in a protein. All these models were trained and tested on 168 non-redundant ABPs chains. First we have developed a Support Vector Machine (SVM) based model using primary sequence of proteins and obtained maximum MCC 0.33 with accuracy of 66.25%. Secondly, another SVM based model was developed using position specific scoring matrix (PSSM) generated by PSI-BLAST. The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.

Conclusion: This study demonstrates that it is possible to predict 'ATP interacting residues' in a protein with moderate accuracy using its sequence. The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence. This method will be useful for researchers studying ATP-binding proteins. Based on this study, a web server has been developed for predicting 'ATP interacting residues' in a protein http://www.imtech.res.in/raghava/atpint/.

Show MeSH