Limits...
APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.

Xia JF, Zhao XM, Song J, Huang DS - BMC Bioinformatics (2010)

Bottom Line: Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method.The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature.Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues.

View Article: PubMed Central - HTML - PubMed

Affiliation: Intelligent Computing Laboratory, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui 230031, China.

ABSTRACT

Background: It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.

Results: In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.

Conclusion: We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.

Show MeSH

Related in: MedlinePlus

The visualization of prediction results for chain A (white) and chain E (blue) of protein complex 1CDL using (a) APIS, (b) KFC, and (c) MINERVA. The following color scheme is used: true positives (known hot spots predicted correctly) in red, true negatives (actual non-hot spots predicted correctly) in yellow, false positives (non-hot spots predicted as hot spots) in green, false negatives (known hot spots not predicted correctly) in purple. In this case, 9 of 12 residues are correctly predicted by our method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2874803&req=5

Figure 2: The visualization of prediction results for chain A (white) and chain E (blue) of protein complex 1CDL using (a) APIS, (b) KFC, and (c) MINERVA. The following color scheme is used: true positives (known hot spots predicted correctly) in red, true negatives (actual non-hot spots predicted correctly) in yellow, false positives (non-hot spots predicted as hot spots) in green, false negatives (known hot spots not predicted correctly) in purple. In this case, 9 of 12 residues are correctly predicted by our method.

Mentions: The first example is calmodulin/myosin light chain kinase complex [55]. Calmodulin (CaM, pdbID: 1cdl, chain A) is a calcium-binding protein expressed in all eukaryotic cells [56]. CaM can bind to and mediate a large number of enzymes and other proteins by Ca2+. Among the enzymes to be stimulated by the calcium-calmodulin complex are a number of protein kinases such as myosin light chain kinase (MLCK, pdbID: 1cdl, chain E). Experimentally verified hot spot residues in 1cdlAE interface are F92_A, W800_E, G804_E, I810_E, R812_E and L813_E. Moreover, F12_A, F19_A, K799_E, K802_E, R808_E and G811_E are found experimentally to be non-hot spots. As a comparison, our method can correctly predict the whole set of hot spots, while KFC only correctly predicts three hot spots and MINERVA identifies four hot spots (Figure 2, Additional file 7). In addition, our method can also correctly predict three out of the six non-hot spots, which are F12_A, K799_E and G811_E. As a contrast, KFC and MINERVA can identify four non-hot spot residues (F12_A, K799_E, K802_E and G811_E), and five non-hot spot residues (F19_A, K799_E, K802_E, R808_E and G811_E), respectively. Although KFC and MINERVA obtained a higher number of non-hot spots, they can identify fewer hot spots. Altogether, 9 out of the 12 residues can be correctly predicted by APIS.


APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.

Xia JF, Zhao XM, Song J, Huang DS - BMC Bioinformatics (2010)

The visualization of prediction results for chain A (white) and chain E (blue) of protein complex 1CDL using (a) APIS, (b) KFC, and (c) MINERVA. The following color scheme is used: true positives (known hot spots predicted correctly) in red, true negatives (actual non-hot spots predicted correctly) in yellow, false positives (non-hot spots predicted as hot spots) in green, false negatives (known hot spots not predicted correctly) in purple. In this case, 9 of 12 residues are correctly predicted by our method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2874803&req=5

Figure 2: The visualization of prediction results for chain A (white) and chain E (blue) of protein complex 1CDL using (a) APIS, (b) KFC, and (c) MINERVA. The following color scheme is used: true positives (known hot spots predicted correctly) in red, true negatives (actual non-hot spots predicted correctly) in yellow, false positives (non-hot spots predicted as hot spots) in green, false negatives (known hot spots not predicted correctly) in purple. In this case, 9 of 12 residues are correctly predicted by our method.
Mentions: The first example is calmodulin/myosin light chain kinase complex [55]. Calmodulin (CaM, pdbID: 1cdl, chain A) is a calcium-binding protein expressed in all eukaryotic cells [56]. CaM can bind to and mediate a large number of enzymes and other proteins by Ca2+. Among the enzymes to be stimulated by the calcium-calmodulin complex are a number of protein kinases such as myosin light chain kinase (MLCK, pdbID: 1cdl, chain E). Experimentally verified hot spot residues in 1cdlAE interface are F92_A, W800_E, G804_E, I810_E, R812_E and L813_E. Moreover, F12_A, F19_A, K799_E, K802_E, R808_E and G811_E are found experimentally to be non-hot spots. As a comparison, our method can correctly predict the whole set of hot spots, while KFC only correctly predicts three hot spots and MINERVA identifies four hot spots (Figure 2, Additional file 7). In addition, our method can also correctly predict three out of the six non-hot spots, which are F12_A, K799_E and G811_E. As a contrast, KFC and MINERVA can identify four non-hot spot residues (F12_A, K799_E, K802_E and G811_E), and five non-hot spot residues (F19_A, K799_E, K802_E, R808_E and G811_E), respectively. Although KFC and MINERVA obtained a higher number of non-hot spots, they can identify fewer hot spots. Altogether, 9 out of the 12 residues can be correctly predicted by APIS.

Bottom Line: Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method.The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature.Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues.

View Article: PubMed Central - HTML - PubMed

Affiliation: Intelligent Computing Laboratory, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui 230031, China.

ABSTRACT

Background: It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.

Results: In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.

Conclusion: We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.

Show MeSH
Related in: MedlinePlus