Limits...
Regression applied to protein binding site prediction and comparison with classification.

Giard J, Ambroise J, Gala JL, Macq B - BMC Bioinformatics (2009)

Bottom Line: We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server.Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable.This adaptability is useful when either false positive or negative rates have to be limited.

View Article: PubMed Central - HTML - PubMed

Affiliation: Communications and Remote Sensing Laboratory, Université Catholique de Louvain, Place du Levant 2, 1348 Louvain-la-Neuve, Belgium. joachim.giard@uclouvain.be

ABSTRACT

Background: The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools.

Results: We compared predictive performances of regression tools with performances of machine learning classifiers. Using leave-one-out cross-validation, we showed that regression tools provide better predictions than classification ones. Among regression tools, Multilayer Perceptron ranked highest in the quality of predictions. We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server. Our method performed similarly to the other methods.

Conclusion: Regression is more efficient than classification when applied to our binding site localization method. When it is possible, using regression instead of classification for other existing binding site predictors will probably improve results. Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable. This adaptability is useful when either false positive or negative rates have to be limited.

Show MeSH

Related in: MedlinePlus

Precision-recall curves for different methods. Precision-Recall Curves comparing the results obtained with different models. The y-axis represents the mean sensitivity (or the precision) over the 180 proteins and the x-axis represents the mean PPV (or Recall). The MLP curve (line with crosses) is obtained using our method with a Multilayer Perceptron.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2749839&req=5

Figure 9: Precision-recall curves for different methods. Precision-Recall Curves comparing the results obtained with different models. The y-axis represents the mean sensitivity (or the precision) over the 180 proteins and the x-axis represents the mean PPV (or Recall). The MLP curve (line with crosses) is obtained using our method with a Multilayer Perceptron.

Mentions: Finally, our method was compared to other methods for which applications are available on the web: Cons-PPISP [17], PINUP [20] and Sharp2 [19]. These three methods return a score for each residue. These scores were mapped on the protein surfaces and the binding site localizations were predicted as it was done for the scores resulting from the different statistical models. The tests were performed on the second dataset (See Supplementary Materials 2). A Precision-Recall graph comparing results for all these methods appears in Figure 9. Performances of our method using MLP are higher than those of Sharp2 method and comparable to those of the PINUP and Cons-PPISP methods. The percentage of proteins of the data set with a higher index than expected (via random selection) was also calculated (Table 2). For both our method using MLP and the PINUP method, the result was better than expected in 71% of the cases. These results are a bit worse than for the other dataset, probably because the training dataset was made of proteins from bounded structures.


Regression applied to protein binding site prediction and comparison with classification.

Giard J, Ambroise J, Gala JL, Macq B - BMC Bioinformatics (2009)

Precision-recall curves for different methods. Precision-Recall Curves comparing the results obtained with different models. The y-axis represents the mean sensitivity (or the precision) over the 180 proteins and the x-axis represents the mean PPV (or Recall). The MLP curve (line with crosses) is obtained using our method with a Multilayer Perceptron.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2749839&req=5

Figure 9: Precision-recall curves for different methods. Precision-Recall Curves comparing the results obtained with different models. The y-axis represents the mean sensitivity (or the precision) over the 180 proteins and the x-axis represents the mean PPV (or Recall). The MLP curve (line with crosses) is obtained using our method with a Multilayer Perceptron.
Mentions: Finally, our method was compared to other methods for which applications are available on the web: Cons-PPISP [17], PINUP [20] and Sharp2 [19]. These three methods return a score for each residue. These scores were mapped on the protein surfaces and the binding site localizations were predicted as it was done for the scores resulting from the different statistical models. The tests were performed on the second dataset (See Supplementary Materials 2). A Precision-Recall graph comparing results for all these methods appears in Figure 9. Performances of our method using MLP are higher than those of Sharp2 method and comparable to those of the PINUP and Cons-PPISP methods. The percentage of proteins of the data set with a higher index than expected (via random selection) was also calculated (Table 2). For both our method using MLP and the PINUP method, the result was better than expected in 71% of the cases. These results are a bit worse than for the other dataset, probably because the training dataset was made of proteins from bounded structures.

Bottom Line: We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server.Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable.This adaptability is useful when either false positive or negative rates have to be limited.

View Article: PubMed Central - HTML - PubMed

Affiliation: Communications and Remote Sensing Laboratory, Université Catholique de Louvain, Place du Levant 2, 1348 Louvain-la-Neuve, Belgium. joachim.giard@uclouvain.be

ABSTRACT

Background: The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools.

Results: We compared predictive performances of regression tools with performances of machine learning classifiers. Using leave-one-out cross-validation, we showed that regression tools provide better predictions than classification ones. Among regression tools, Multilayer Perceptron ranked highest in the quality of predictions. We compared also the predictive performance of our patches based method using Multilayer Perceptron with the performance of three other methods usable through a web server. Our method performed similarly to the other methods.

Conclusion: Regression is more efficient than classification when applied to our binding site localization method. When it is possible, using regression instead of classification for other existing binding site predictors will probably improve results. Furthermore, the method presented in this work is flexible because the size of the predicted binding site is adjustable. This adaptability is useful when either false positive or negative rates have to be limited.

Show MeSH
Related in: MedlinePlus