Limits...
Discovering amino acid patterns on binding sites in protein complexes.

Kuo HC, Ong PL, Lin JC, Huang JP - Bioinformation (2011)

Bottom Line: As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user.As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest.Then we used the association rules to mine these records for discovering relationships.

View Article: PubMed Central - PubMed

ABSTRACT
Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes.

No MeSH data available.


Frequent patterns are consisted of AAs on the concave of bindingsites in a protein and AA patterns on the convex of binding sites in anotherprotein.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3064845&req=5

Figure 8: Frequent patterns are consisted of AAs on the concave of bindingsites in a protein and AA patterns on the convex of binding sites in anotherprotein.

Mentions: In the first experiment, we try to find the frequent appearance residues on thebinding sites of all recognition complexes. Table 2 (see Table 2) shows the result of applying association rule mining on the 463 AAtransactions. In Table 2 (see Table 2), we discovered that nomatter which side residues form on a protein, {arg} binds at the highestfrequency; or, we can say {arg} appears most on the binding sites in therecognition complexes.In the second experiment we take the shape of bindingsites into account. In data mining terminology, we put {arg} to the consequentand observe the antecedent {antecedent} → {consequent}, as illustrated inFigure 4. We set the minimum support at 1.5% and the minimum confidence at80%. The results we mined, such as {phe, ser} →{arg}, are shown in Figure 5.Figure 6 shows {arg} on the concave shape of binding sites in a protein andthe mining AA patterns on the convex shape of binding sites in another protein.The minimum support and the minimum confidence is the same above.Furthermore, we are also interested in the higher frequency AA patterns on thebinding sites in recognition complexes. Figure 7 describes AAs on the convexbinding sites in a protein, which contact more frequently with the AA patternson the concave binding sites in another protein. The minimum support is 2%and the minimum confidence is 75%. For the same above-mentionedexperiment, we also mined the opposite side to discover different situations(Figure 8). The minimum support is also 2% and the minimum confidence isalso 75%. All of above experiments show if we set various Supports andConfidences properly, and we will discover more surprising facts in the datasetof recognition protein complexes.


Discovering amino acid patterns on binding sites in protein complexes.

Kuo HC, Ong PL, Lin JC, Huang JP - Bioinformation (2011)

Frequent patterns are consisted of AAs on the concave of bindingsites in a protein and AA patterns on the convex of binding sites in anotherprotein.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3064845&req=5

Figure 8: Frequent patterns are consisted of AAs on the concave of bindingsites in a protein and AA patterns on the convex of binding sites in anotherprotein.
Mentions: In the first experiment, we try to find the frequent appearance residues on thebinding sites of all recognition complexes. Table 2 (see Table 2) shows the result of applying association rule mining on the 463 AAtransactions. In Table 2 (see Table 2), we discovered that nomatter which side residues form on a protein, {arg} binds at the highestfrequency; or, we can say {arg} appears most on the binding sites in therecognition complexes.In the second experiment we take the shape of bindingsites into account. In data mining terminology, we put {arg} to the consequentand observe the antecedent {antecedent} → {consequent}, as illustrated inFigure 4. We set the minimum support at 1.5% and the minimum confidence at80%. The results we mined, such as {phe, ser} →{arg}, are shown in Figure 5.Figure 6 shows {arg} on the concave shape of binding sites in a protein andthe mining AA patterns on the convex shape of binding sites in another protein.The minimum support and the minimum confidence is the same above.Furthermore, we are also interested in the higher frequency AA patterns on thebinding sites in recognition complexes. Figure 7 describes AAs on the convexbinding sites in a protein, which contact more frequently with the AA patternson the concave binding sites in another protein. The minimum support is 2%and the minimum confidence is 75%. For the same above-mentionedexperiment, we also mined the opposite side to discover different situations(Figure 8). The minimum support is also 2% and the minimum confidence isalso 75%. All of above experiments show if we set various Supports andConfidences properly, and we will discover more surprising facts in the datasetof recognition protein complexes.

Bottom Line: As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user.As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest.Then we used the association rules to mine these records for discovering relationships.

View Article: PubMed Central - PubMed

ABSTRACT
Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes.

No MeSH data available.