Limits...
Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels.

Ruan P, Hayashida M, Maruyama O, Akutsu T - BMC Bioinformatics (2014)

Bottom Line: We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase.As the second classifier, we examine SVMs and relevance vector machines (RVMs).We perform 10-fold cross-validation computational experiments.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods.

Results: We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes.

Conclusions: We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes.

Show MeSH
Example of a subgraph including three focused proteins Pi, Pj, Pk and their neighboring proteins. In this example, protein Pr is neighboring to both of Pi and Pk.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4016531&req=5

Figure 1: Example of a subgraph including three focused proteins Pi, Pj, Pk and their neighboring proteins. In this example, protein Pr is neighboring to both of Pi and Pk.

Mentions: In our previous study, we proposed seven feature space mappings for prediction of heterodimeric protein complexes [14]. These are based on the idea that the reliability of the interaction in a heterodimer should be high and conversely the reliability of the interaction between a protein in a heterodimer and a protein not in the heterodimer should be low. We extend the feature space mappings for two interacting proteins to mappings for three proteins. Table 1 shows detailed extended mappings for three distinct proteins Pi, Pj, and Pk that are connected in the protein-protein interaction network. Here the fifth mapping in the previous study is eliminated because more neighboring proteins increase the maximum of differences close to the maximum of neighboring weights denoted by (F3). (F1) and (F2) denote the maximum and minimum of the weights of interactions between Pi, Pj, and Pk, respectively. The first feature in the previous study is the weight of the interaction between two proteins. Since there are at least two interactions for three focused proteins and we cannot use all the weights as elements of our feature vector without changes, we take the maximum and minimum of the weights (see Figure 1). In addition, the proteins in a heterotrimer should interact with each other, and (F2), which is the minimum of the weights, is expected to be high. (F3) and (F4) denote the maximum and minimum of the weights of interactions between either of Pi, Pj, Pk and a neighboring protein Pr, respectively, where r ≠ i, j, k and (i, r) ∈ E, (j, r) ∈ E, or (k, r) ∈ E. It is considered that (F3), which is the maximum of the neighboring weights of a heterotrimer, should be lower than the weights of interactions in the heterotrimer. Consider the case that a protein Pr interacts with two of proteins Pi, Pj, and Pk, where Pr is not any of Pi, Pj, and Pk (see Figure 1). If the weights of both interactions are large, these proteins including Pr may form a complex. We introduce the maximum of smaller weights of interactions with neighboring proteins Pr denoted by (F5). (F6) and (F7) denote the maximum and the minimum of the numbers of domains contained in Pi, Pj, and Pk, respectively. The number of domains in a protein complex is expected to be large because domains are considered as mediators of protein-protein interactions.


Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels.

Ruan P, Hayashida M, Maruyama O, Akutsu T - BMC Bioinformatics (2014)

Example of a subgraph including three focused proteins Pi, Pj, Pk and their neighboring proteins. In this example, protein Pr is neighboring to both of Pi and Pk.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4016531&req=5

Figure 1: Example of a subgraph including three focused proteins Pi, Pj, Pk and their neighboring proteins. In this example, protein Pr is neighboring to both of Pi and Pk.
Mentions: In our previous study, we proposed seven feature space mappings for prediction of heterodimeric protein complexes [14]. These are based on the idea that the reliability of the interaction in a heterodimer should be high and conversely the reliability of the interaction between a protein in a heterodimer and a protein not in the heterodimer should be low. We extend the feature space mappings for two interacting proteins to mappings for three proteins. Table 1 shows detailed extended mappings for three distinct proteins Pi, Pj, and Pk that are connected in the protein-protein interaction network. Here the fifth mapping in the previous study is eliminated because more neighboring proteins increase the maximum of differences close to the maximum of neighboring weights denoted by (F3). (F1) and (F2) denote the maximum and minimum of the weights of interactions between Pi, Pj, and Pk, respectively. The first feature in the previous study is the weight of the interaction between two proteins. Since there are at least two interactions for three focused proteins and we cannot use all the weights as elements of our feature vector without changes, we take the maximum and minimum of the weights (see Figure 1). In addition, the proteins in a heterotrimer should interact with each other, and (F2), which is the minimum of the weights, is expected to be high. (F3) and (F4) denote the maximum and minimum of the weights of interactions between either of Pi, Pj, Pk and a neighboring protein Pr, respectively, where r ≠ i, j, k and (i, r) ∈ E, (j, r) ∈ E, or (k, r) ∈ E. It is considered that (F3), which is the maximum of the neighboring weights of a heterotrimer, should be lower than the weights of interactions in the heterotrimer. Consider the case that a protein Pr interacts with two of proteins Pi, Pj, and Pk, where Pr is not any of Pi, Pj, and Pk (see Figure 1). If the weights of both interactions are large, these proteins including Pr may form a complex. We introduce the maximum of smaller weights of interactions with neighboring proteins Pr denoted by (F5). (F6) and (F7) denote the maximum and the minimum of the numbers of domains contained in Pi, Pj, and Pk, respectively. The number of domains in a protein complex is expected to be large because domains are considered as mediators of protein-protein interactions.

Bottom Line: We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase.As the second classifier, we examine SVMs and relevance vector machines (RVMs).We perform 10-fold cross-validation computational experiments.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods.

Results: We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes.

Conclusions: We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes.

Show MeSH