Limits...
Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

Chen P, Li J - BMC Bioinformatics (2010)

Bottom Line: Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost.Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues.In addition, the ensemble of SVM classifiers improves the prediction performance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.

ABSTRACT

Background: Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

Results: We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

Conclusions: The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

Availability: Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

Show MeSH
Visualization of the overall orientation and prediction results on CCD-IBD complex PDB:2b4j. (a) The overall orientation of CCD-IBD complex; (b) Protein-protein interaction predictions of CCD-IBD complex. The orientation of the complex is illustrated by a smooth spline between consecutive alpha carbon positions. Left graph denotes the natural orientation, while the right one illustrates the protein-protein interaction prediction of the complex. In the right graph, blue sphere stands for TP residue, bluetint one stands for FP residue, and gold sphere demonstrates FN residue. All other residues (not shown as colored spheres) are true negatives (TN). Note that the orientation of the complex in the right graph is varied a little to clearly show the predictions of protein interface residues. Additionally each sphere represents an alpha-carbon atom of each residue. We used RasTop http://www.geneinfinity.org/rastop/ software to display the structure of this complex.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2921408&req=5

Figure 6: Visualization of the overall orientation and prediction results on CCD-IBD complex PDB:2b4j. (a) The overall orientation of CCD-IBD complex; (b) Protein-protein interaction predictions of CCD-IBD complex. The orientation of the complex is illustrated by a smooth spline between consecutive alpha carbon positions. Left graph denotes the natural orientation, while the right one illustrates the protein-protein interaction prediction of the complex. In the right graph, blue sphere stands for TP residue, bluetint one stands for FP residue, and gold sphere demonstrates FN residue. All other residues (not shown as colored spheres) are true negatives (TN). Note that the orientation of the complex in the right graph is varied a little to clearly show the predictions of protein interface residues. Additionally each sphere represents an alpha-carbon atom of each residue. We used RasTop http://www.geneinfinity.org/rastop/ software to display the structure of this complex.

Mentions: The asymmetric unit of the complex PDB:2bgn contains two molecules, a dimer of integrase (IN) catalytic core domains (CCD) (chains A and B in Figure 6) and a pair of human lens epithelium-derived growth factor (LEDGF) IN-binding domain (IBD) molecules (chains C and D in Figure 6 bound at the CCD dimer interface) [51]. LEDGF binds HIV-1 IN via the small IBD within its C-terminal region. Previous results showed that the IBD is both necessary and sufficient for the interaction with HIV-1 IN [51,52]. There are several key intermolecular contacts at the CCD-IBD interface. Residues Ile365, Asp366, and Phe406 play critical roles in HIV-1 IN recognition as hotspot residues which are located at the interhelical loops within IBD molecules (chain C or D). The water molecule hydrogen-bonds link to the main-chain carbonyl group of LEDGF residue Ile365 and IN residue Thr125. We correctly predict the hotspot residues Ile365 and Asp366. Overall, our method achieves a good prediction performance with a sensitivity of 35.59%, precision of 80.77%, specificity of 96.93%, accuracy of 80.63%, and F1 of 49.41% when achieving the largest MCC of 0.4468. In order for more correct predicted interface residues, our model can obtain a precision of 90.63% with a sensitivity of 27.88%, specificity of 98.84%, accuracy of 78.45%, F1 of 42.65%, and MCC of 0.426. In this case the hotspot residues Ile365 and Asp366 are also predicted correctly.


Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

Chen P, Li J - BMC Bioinformatics (2010)

Visualization of the overall orientation and prediction results on CCD-IBD complex PDB:2b4j. (a) The overall orientation of CCD-IBD complex; (b) Protein-protein interaction predictions of CCD-IBD complex. The orientation of the complex is illustrated by a smooth spline between consecutive alpha carbon positions. Left graph denotes the natural orientation, while the right one illustrates the protein-protein interaction prediction of the complex. In the right graph, blue sphere stands for TP residue, bluetint one stands for FP residue, and gold sphere demonstrates FN residue. All other residues (not shown as colored spheres) are true negatives (TN). Note that the orientation of the complex in the right graph is varied a little to clearly show the predictions of protein interface residues. Additionally each sphere represents an alpha-carbon atom of each residue. We used RasTop http://www.geneinfinity.org/rastop/ software to display the structure of this complex.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2921408&req=5

Figure 6: Visualization of the overall orientation and prediction results on CCD-IBD complex PDB:2b4j. (a) The overall orientation of CCD-IBD complex; (b) Protein-protein interaction predictions of CCD-IBD complex. The orientation of the complex is illustrated by a smooth spline between consecutive alpha carbon positions. Left graph denotes the natural orientation, while the right one illustrates the protein-protein interaction prediction of the complex. In the right graph, blue sphere stands for TP residue, bluetint one stands for FP residue, and gold sphere demonstrates FN residue. All other residues (not shown as colored spheres) are true negatives (TN). Note that the orientation of the complex in the right graph is varied a little to clearly show the predictions of protein interface residues. Additionally each sphere represents an alpha-carbon atom of each residue. We used RasTop http://www.geneinfinity.org/rastop/ software to display the structure of this complex.
Mentions: The asymmetric unit of the complex PDB:2bgn contains two molecules, a dimer of integrase (IN) catalytic core domains (CCD) (chains A and B in Figure 6) and a pair of human lens epithelium-derived growth factor (LEDGF) IN-binding domain (IBD) molecules (chains C and D in Figure 6 bound at the CCD dimer interface) [51]. LEDGF binds HIV-1 IN via the small IBD within its C-terminal region. Previous results showed that the IBD is both necessary and sufficient for the interaction with HIV-1 IN [51,52]. There are several key intermolecular contacts at the CCD-IBD interface. Residues Ile365, Asp366, and Phe406 play critical roles in HIV-1 IN recognition as hotspot residues which are located at the interhelical loops within IBD molecules (chain C or D). The water molecule hydrogen-bonds link to the main-chain carbonyl group of LEDGF residue Ile365 and IN residue Thr125. We correctly predict the hotspot residues Ile365 and Asp366. Overall, our method achieves a good prediction performance with a sensitivity of 35.59%, precision of 80.77%, specificity of 96.93%, accuracy of 80.63%, and F1 of 49.41% when achieving the largest MCC of 0.4468. In order for more correct predicted interface residues, our model can obtain a precision of 90.63% with a sensitivity of 27.88%, specificity of 98.84%, accuracy of 78.45%, F1 of 42.65%, and MCC of 0.426. In this case the hotspot residues Ile365 and Asp366 are also predicted correctly.

Bottom Line: Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost.Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues.In addition, the ensemble of SVM classifiers improves the prediction performance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.

ABSTRACT

Background: Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

Results: We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

Conclusions: The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

Availability: Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

Show MeSH