Limits...
A mixture of feature experts approach for protein-protein interaction prediction.

Qi Y, Klein-Seetharaman J, Bar-Joseph Z - BMC Bioinformatics (2007)

Bottom Line: High-throughput methods can directly detect the set of interacting proteins in model species but the results are often incomplete and exhibit high false positive and false negative rates.However, due to missing data and high redundancy among the features used, different protein pairs may benefit from different features based on the set of attributes available.Our method improved upon the best previous methods for this task.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. qyj@cs.cmu.edu

ABSTRACT

Background: High-throughput methods can directly detect the set of interacting proteins in model species but the results are often incomplete and exhibit high false positive and false negative rates. A number of researchers have recently presented methods for integrating direct and indirect data for predicting interactions. These methods utilize a common classifier for all pairs. However, due to missing data and high redundancy among the features used, different protein pairs may benefit from different features based on the set of attributes available. In addition, in many cases it is hard to directly determine which of the data sources contributed to a prediction. This information is important for biologists using these predications in the design of new experiments.

Results: To address these challenges we propose a Mixture-of-Feature-Experts method for protein-protein interaction prediction. We split the features into roughly homogeneous sets of feature experts. The individual experts use logistic regression and their scores are combined using another logistic regression. When combining the scores the weighting of each expert depends on the set of input attributes available for that pair. Thus, different experts will have different influence on the prediction depending on the available features.

Conclusion: We applied our method to predict the set of interacting proteins in yeast and human cells. Our method improved upon the best previous methods for this task. In addition, the weighting of the experts provides means to evaluate the prediction based on the high scoring features.

Show MeSH

Related in: MedlinePlus

Performance Comparison in Yeast. Average Precision vs. Recall curves when comparing MFE method with four other classifiers (LR/NB/RF/SVM) for PPI prediction in yeast. LR: Logistic regression; NB: Naive Bayes; RF: Random Forest; SVM: Support Vector Machine; MFE: Mixture-of-Feature-Experts. The MFE curve dominates the curves for the other four methods in most of the recall ranges.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2230507&req=5

Figure 4: Performance Comparison in Yeast. Average Precision vs. Recall curves when comparing MFE method with four other classifiers (LR/NB/RF/SVM) for PPI prediction in yeast. LR: Logistic regression; NB: Naive Bayes; RF: Random Forest; SVM: Support Vector Machine; MFE: Mixture-of-Feature-Experts. The MFE curve dominates the curves for the other four methods in most of the recall ranges.

Mentions: Based on the estimated ratio of interacting versus non-interacting pairs in yeast and human, we have roughly ~50 to ~100 positive PPIs in each test run. For the training set, we up-sampled the positive examples in a pre-processing step, which resulted in roughly ~800 positive examples for each training run in human and roughly ~300 positive pairs for each yeast training. This sampling strategy reduces the problem of too few positive examples in the training set without affecting the performance significantly [33]. Figure 4 plots the average precision versus recall curves of these five different methods for the yeast PPIs prediction and Figure 5 is for human. In both figures, the curves derived from MFE approach dominate the other four methods in most of the low recall ranges.


A mixture of feature experts approach for protein-protein interaction prediction.

Qi Y, Klein-Seetharaman J, Bar-Joseph Z - BMC Bioinformatics (2007)

Performance Comparison in Yeast. Average Precision vs. Recall curves when comparing MFE method with four other classifiers (LR/NB/RF/SVM) for PPI prediction in yeast. LR: Logistic regression; NB: Naive Bayes; RF: Random Forest; SVM: Support Vector Machine; MFE: Mixture-of-Feature-Experts. The MFE curve dominates the curves for the other four methods in most of the recall ranges.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2230507&req=5

Figure 4: Performance Comparison in Yeast. Average Precision vs. Recall curves when comparing MFE method with four other classifiers (LR/NB/RF/SVM) for PPI prediction in yeast. LR: Logistic regression; NB: Naive Bayes; RF: Random Forest; SVM: Support Vector Machine; MFE: Mixture-of-Feature-Experts. The MFE curve dominates the curves for the other four methods in most of the recall ranges.
Mentions: Based on the estimated ratio of interacting versus non-interacting pairs in yeast and human, we have roughly ~50 to ~100 positive PPIs in each test run. For the training set, we up-sampled the positive examples in a pre-processing step, which resulted in roughly ~800 positive examples for each training run in human and roughly ~300 positive pairs for each yeast training. This sampling strategy reduces the problem of too few positive examples in the training set without affecting the performance significantly [33]. Figure 4 plots the average precision versus recall curves of these five different methods for the yeast PPIs prediction and Figure 5 is for human. In both figures, the curves derived from MFE approach dominate the other four methods in most of the low recall ranges.

Bottom Line: High-throughput methods can directly detect the set of interacting proteins in model species but the results are often incomplete and exhibit high false positive and false negative rates.However, due to missing data and high redundancy among the features used, different protein pairs may benefit from different features based on the set of attributes available.Our method improved upon the best previous methods for this task.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. qyj@cs.cmu.edu

ABSTRACT

Background: High-throughput methods can directly detect the set of interacting proteins in model species but the results are often incomplete and exhibit high false positive and false negative rates. A number of researchers have recently presented methods for integrating direct and indirect data for predicting interactions. These methods utilize a common classifier for all pairs. However, due to missing data and high redundancy among the features used, different protein pairs may benefit from different features based on the set of attributes available. In addition, in many cases it is hard to directly determine which of the data sources contributed to a prediction. This information is important for biologists using these predications in the design of new experiments.

Results: To address these challenges we propose a Mixture-of-Feature-Experts method for protein-protein interaction prediction. We split the features into roughly homogeneous sets of feature experts. The individual experts use logistic regression and their scores are combined using another logistic regression. When combining the scores the weighting of each expert depends on the set of input attributes available for that pair. Thus, different experts will have different influence on the prediction depending on the available features.

Conclusion: We applied our method to predict the set of interacting proteins in yeast and human cells. Our method improved upon the best previous methods for this task. In addition, the weighting of the experts provides means to evaluate the prediction based on the high scoring features.

Show MeSH
Related in: MedlinePlus