Limits...
Prediction of protein-protein interaction types using association rule based classification.

Park SH, Reyes JA, Gilbert DR, Kim JW, Kim S - BMC Bioinformatics (2009)

Bottom Line: Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models.We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content.The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics & Life Science, Soongsil University, Seoul, Korea. shpark@ssu.ac.kr

ABSTRACT

Background: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches.

Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content.

Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/

Show MeSH
2D plots for pairs of association rules. These plot data points by pairs of association rules. X and Y axes are a pair of rules and each of them have two boolean values. 0 represents negative data points not meeting with a rule of each axis and 1 represents for positive data points meeting with the rule. The data points on the upper left corner meet a rule used for Y axis and the data points on the down right corner meet a rule used for X axis. The points on the upper right corner meet with both rules used for X and Y axes. Plots in Figure 3(a), (b), and (c) characterize distribution of inhibitors in enzyme-inhibitors interactions. Rule 28 is used for X axis in plots (a), (b) and (c). Rules 1, 3 and 38 are used for the Y axis in those plots. (a) represents an example for a pair of rules both including SSE information (e.g. helix and loop content). (b) and (c) show examples for combination of SSE content information (Rule 28: "Nohelix ") with other properties (e.g. SCOPClass, number of atoms and etc.). Plot (b) (Rule 3 versus Rule 28) is identical to the plot generated by Rule 29. Enzymes interacting with a group of inhibitors characterized by (a), (b), and (c) are featured by in Figure 3(e), and (f). Enzymes and inhibitors described by Rules 40 and 29 respectively are plotted in (d) where there is no point matching with both rules. Plot (d) reflects proper interpretation of association rules regarding interactions between enzymes and inhibitors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2667511&req=5

Figure 3: 2D plots for pairs of association rules. These plot data points by pairs of association rules. X and Y axes are a pair of rules and each of them have two boolean values. 0 represents negative data points not meeting with a rule of each axis and 1 represents for positive data points meeting with the rule. The data points on the upper left corner meet a rule used for Y axis and the data points on the down right corner meet a rule used for X axis. The points on the upper right corner meet with both rules used for X and Y axes. Plots in Figure 3(a), (b), and (c) characterize distribution of inhibitors in enzyme-inhibitors interactions. Rule 28 is used for X axis in plots (a), (b) and (c). Rules 1, 3 and 38 are used for the Y axis in those plots. (a) represents an example for a pair of rules both including SSE information (e.g. helix and loop content). (b) and (c) show examples for combination of SSE content information (Rule 28: "Nohelix ") with other properties (e.g. SCOPClass, number of atoms and etc.). Plot (b) (Rule 3 versus Rule 28) is identical to the plot generated by Rule 29. Enzymes interacting with a group of inhibitors characterized by (a), (b), and (c) are featured by in Figure 3(e), and (f). Enzymes and inhibitors described by Rules 40 and 29 respectively are plotted in (d) where there is no point matching with both rules. Plot (d) reflects proper interpretation of association rules regarding interactions between enzymes and inhibitors.

Mentions: We have shown that the interaction sites were dominated by non-regular region: especially for ENZ interactions, almost of the sites in average were composed of non-helix and non-beta strand regions (Figure 1). This is manifested in rules 29 (Table 7), 1, 4 and 6, all of which require 50 – 80% content of non-regular regions to be classified as ENZ. Some of the rules containing negation predicates are strong indicators of certain interaction types. For example, "Nohelix " and "Nostrand " in the interaction sites imply ENZ (Rule 29) and nonENZ (Rules 7, 12 and 15), respectively. HET is characterized by relatively small portions of strands (Rules 18, and 19) and "Nostrand " (Rule 24). It is also observed that rules containing such SSE content information conjuncted with other properties (Rules 29, 7, 12, 15 and 24 in Figure 2) or combined with other rules (Figure 3(a), (b) and 3(c)) become stronger discriminators for classifying PPI types than rules containing only SSE content information (Rules 1, 2, 4, 6, 14, 18, 19 and 21 in Figure 2). We note that some rules (Rules 29 and 7 in Figure 2) containing SSE information with SCOP classes are the most discriminative and informative in order to characterize ENZ and nonENZ.


Prediction of protein-protein interaction types using association rule based classification.

Park SH, Reyes JA, Gilbert DR, Kim JW, Kim S - BMC Bioinformatics (2009)

2D plots for pairs of association rules. These plot data points by pairs of association rules. X and Y axes are a pair of rules and each of them have two boolean values. 0 represents negative data points not meeting with a rule of each axis and 1 represents for positive data points meeting with the rule. The data points on the upper left corner meet a rule used for Y axis and the data points on the down right corner meet a rule used for X axis. The points on the upper right corner meet with both rules used for X and Y axes. Plots in Figure 3(a), (b), and (c) characterize distribution of inhibitors in enzyme-inhibitors interactions. Rule 28 is used for X axis in plots (a), (b) and (c). Rules 1, 3 and 38 are used for the Y axis in those plots. (a) represents an example for a pair of rules both including SSE information (e.g. helix and loop content). (b) and (c) show examples for combination of SSE content information (Rule 28: "Nohelix ") with other properties (e.g. SCOPClass, number of atoms and etc.). Plot (b) (Rule 3 versus Rule 28) is identical to the plot generated by Rule 29. Enzymes interacting with a group of inhibitors characterized by (a), (b), and (c) are featured by in Figure 3(e), and (f). Enzymes and inhibitors described by Rules 40 and 29 respectively are plotted in (d) where there is no point matching with both rules. Plot (d) reflects proper interpretation of association rules regarding interactions between enzymes and inhibitors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2667511&req=5

Figure 3: 2D plots for pairs of association rules. These plot data points by pairs of association rules. X and Y axes are a pair of rules and each of them have two boolean values. 0 represents negative data points not meeting with a rule of each axis and 1 represents for positive data points meeting with the rule. The data points on the upper left corner meet a rule used for Y axis and the data points on the down right corner meet a rule used for X axis. The points on the upper right corner meet with both rules used for X and Y axes. Plots in Figure 3(a), (b), and (c) characterize distribution of inhibitors in enzyme-inhibitors interactions. Rule 28 is used for X axis in plots (a), (b) and (c). Rules 1, 3 and 38 are used for the Y axis in those plots. (a) represents an example for a pair of rules both including SSE information (e.g. helix and loop content). (b) and (c) show examples for combination of SSE content information (Rule 28: "Nohelix ") with other properties (e.g. SCOPClass, number of atoms and etc.). Plot (b) (Rule 3 versus Rule 28) is identical to the plot generated by Rule 29. Enzymes interacting with a group of inhibitors characterized by (a), (b), and (c) are featured by in Figure 3(e), and (f). Enzymes and inhibitors described by Rules 40 and 29 respectively are plotted in (d) where there is no point matching with both rules. Plot (d) reflects proper interpretation of association rules regarding interactions between enzymes and inhibitors.
Mentions: We have shown that the interaction sites were dominated by non-regular region: especially for ENZ interactions, almost of the sites in average were composed of non-helix and non-beta strand regions (Figure 1). This is manifested in rules 29 (Table 7), 1, 4 and 6, all of which require 50 – 80% content of non-regular regions to be classified as ENZ. Some of the rules containing negation predicates are strong indicators of certain interaction types. For example, "Nohelix " and "Nostrand " in the interaction sites imply ENZ (Rule 29) and nonENZ (Rules 7, 12 and 15), respectively. HET is characterized by relatively small portions of strands (Rules 18, and 19) and "Nostrand " (Rule 24). It is also observed that rules containing such SSE content information conjuncted with other properties (Rules 29, 7, 12, 15 and 24 in Figure 2) or combined with other rules (Figure 3(a), (b) and 3(c)) become stronger discriminators for classifying PPI types than rules containing only SSE content information (Rules 1, 2, 4, 6, 14, 18, 19 and 21 in Figure 2). We note that some rules (Rules 29 and 7 in Figure 2) containing SSE information with SCOP classes are the most discriminative and informative in order to characterize ENZ and nonENZ.

Bottom Line: Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models.We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content.The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics & Life Science, Soongsil University, Seoul, Korea. shpark@ssu.ac.kr

ABSTRACT

Background: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches.

Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content.

Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/

Show MeSH