Limits...
Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

Yip KY, Kim PM, McDermott D, Gerstein M - BMC Bioinformatics (2009)

Bottom Line: The predictions at each level could benefit from using the features at all three levels.To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data.Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA. yuklap.yip@yale.edu

ABSTRACT

Background: Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy: whole-proteins, domains, and residues. Each level offers a distinct and complementary set of features for computationally predicting interactions, including functional genomic features of whole proteins, evolutionary features of domain families and physical-chemical features of individual residues. The predictions at each level could benefit from using the features at all three levels. However, it is not trivial as the features are provided at different granularity.

Results: To link up the predictions at the three levels, we propose a multi-level machine-learning framework that allows for explicit information flow between the levels. We demonstrate, using representative yeast interaction networks, that our algorithm is able to utilize complementary feature sets to make more accurate predictions at the three levels than when the three problems are approached independently. To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data. Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.

Availability: The software and a readme file can be downloaded at http://networks.gersteinlab.org/mll. The programs are written in Java, and can be run on any platform with Java 1.4 or higher and Apache Ant 1.7.0 or higher installed. The software can be used without a license.

Show MeSH
Receiver operator characteristic (ROC) curves of protein interaction predictions with different frameworks and training levels.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2734556&req=5

Figure 2: Receiver operator characteristic (ROC) curves of protein interaction predictions with different frameworks and training levels.

Mentions: Downward flow of training information did help the prediction of domain instance interactions. However, the results of the residue level are quite unsatisfactory, with accuracies even lower than those with independent levels no matter assisted by the training examples of the protein level or domain level. In contrast, the results for bidirectional flow are encouraging. In all cases, the accuracies are higher than the other two architectures. For example, while using the domain level to help the residue level decreased the accuracy of the latter from 0.5675 to 0.5128 with unidirectional flow, the accuracy was increased to 0.6182 with bidirectional flow. As an illustration of the difference in performance of the three architectures, the various ROC curves of protein, domain and residue interaction predictions are shown in Figures 2, 3 and 4, respectively.


Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

Yip KY, Kim PM, McDermott D, Gerstein M - BMC Bioinformatics (2009)

Receiver operator characteristic (ROC) curves of protein interaction predictions with different frameworks and training levels.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2734556&req=5

Figure 2: Receiver operator characteristic (ROC) curves of protein interaction predictions with different frameworks and training levels.
Mentions: Downward flow of training information did help the prediction of domain instance interactions. However, the results of the residue level are quite unsatisfactory, with accuracies even lower than those with independent levels no matter assisted by the training examples of the protein level or domain level. In contrast, the results for bidirectional flow are encouraging. In all cases, the accuracies are higher than the other two architectures. For example, while using the domain level to help the residue level decreased the accuracy of the latter from 0.5675 to 0.5128 with unidirectional flow, the accuracy was increased to 0.6182 with bidirectional flow. As an illustration of the difference in performance of the three architectures, the various ROC curves of protein, domain and residue interaction predictions are shown in Figures 2, 3 and 4, respectively.

Bottom Line: The predictions at each level could benefit from using the features at all three levels.To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data.Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA. yuklap.yip@yale.edu

ABSTRACT

Background: Proteins interact through specific binding interfaces that contain many residues in domains. Protein interactions thus occur on three different levels of a concept hierarchy: whole-proteins, domains, and residues. Each level offers a distinct and complementary set of features for computationally predicting interactions, including functional genomic features of whole proteins, evolutionary features of domain families and physical-chemical features of individual residues. The predictions at each level could benefit from using the features at all three levels. However, it is not trivial as the features are provided at different granularity.

Results: To link up the predictions at the three levels, we propose a multi-level machine-learning framework that allows for explicit information flow between the levels. We demonstrate, using representative yeast interaction networks, that our algorithm is able to utilize complementary feature sets to make more accurate predictions at the three levels than when the three problems are approached independently. To facilitate application of our multi-level learning framework, we discuss three key aspects of multi-level learning and the corresponding design choices that we have made in the implementation of a concrete learning algorithm. 1) Architecture of information flow: we show the greater flexibility of bidirectional flow over independent levels and unidirectional flow; 2) Coupling mechanism of the different levels: We show how this can be accomplished via augmenting the training sets at each level, and discuss the prevention of error propagation between different levels by means of soft coupling; 3) Sparseness of data: We show that the multi-level framework compounds data sparsity issues, and discuss how this can be dealt with by building local models in information-rich parts of the data. Our proof-of-concept learning algorithm demonstrates the advantage of combining levels, and opens up opportunities for further research.

Availability: The software and a readme file can be downloaded at http://networks.gersteinlab.org/mll. The programs are written in Java, and can be run on any platform with Java 1.4 or higher and Apache Ant 1.7.0 or higher installed. The software can be used without a license.

Show MeSH