Limits...
Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

Chen P, Li J - BMC Bioinformatics (2010)

Bottom Line: Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost.Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues.In addition, the ensemble of SVM classifiers improves the prediction performance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.

ABSTRACT

Background: Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

Results: We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

Conclusions: The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

Availability: Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

Show MeSH
Performance by our model when using 5 × 5 SOM. The figure illustrates performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2921408&req=5

Figure 2: Performance by our model when using 5 × 5 SOM. The figure illustrates performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.

Mentions: In the experiments by using the 5 × 5 SOM and the combination SVM classifiers, we constructed the same 2 × 5 SVM ensembles, trained and tested our model as above. We obtained 25 clusters in total. Clusters from 13 to 17 and clusters from 22 to 25 were retained. Performance by averaging the retained clusters is shown in Figure 2. Results show that the model with threshold 5 outperforms others and achieves the largest MCC of 0.4946 and F1 of 55.95%. Furthermore, it can be found that the 5-th combined SVM performs the best when precision is larger than 50% and, the model with threshold 9 makes the best prediction when sensitivity is larger than 50%. The tendencies of Sensitivity-MCC curves are almost the same as those of Sensitivity-Precision curves.


Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information.

Chen P, Li J - BMC Bioinformatics (2010)

Performance by our model when using 5 × 5 SOM. The figure illustrates performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2921408&req=5

Figure 2: Performance by our model when using 5 × 5 SOM. The figure illustrates performance curves of sensitivity-precision and sensitivity-MCC after combining the ten-SVMs. Numbers in the legend stand for SVM with different thresholds. Note that curves with the same color correspond to the model with the same threshold.
Mentions: In the experiments by using the 5 × 5 SOM and the combination SVM classifiers, we constructed the same 2 × 5 SVM ensembles, trained and tested our model as above. We obtained 25 clusters in total. Clusters from 13 to 17 and clusters from 22 to 25 were retained. Performance by averaging the retained clusters is shown in Figure 2. Results show that the model with threshold 5 outperforms others and achieves the largest MCC of 0.4946 and F1 of 55.95%. Furthermore, it can be found that the 5-th combined SVM performs the best when precision is larger than 50% and, the model with threshold 9 makes the best prediction when sensitivity is larger than 50%. The tendencies of Sensitivity-MCC curves are almost the same as those of Sensitivity-Precision curves.

Bottom Line: Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost.Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues.In addition, the ensemble of SVM classifiers improves the prediction performance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, 639798 Singapore.

ABSTRACT

Background: Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.

Results: We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.

Conclusions: The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.

Availability: Datasets and software are available at http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm.

Show MeSH