Limits...
Discovering amino acid patterns on binding sites in protein complexes.

Kuo HC, Ong PL, Lin JC, Huang JP - Bioinformation (2011)

Bottom Line: As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user.As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest.Then we used the association rules to mine these records for discovering relationships.

View Article: PubMed Central - PubMed

ABSTRACT
Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes.

No MeSH data available.


The illustration of protein complex 1BKD. The left picture shows theresult of above steps.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3064845&req=5

Figure 2: The illustration of protein complex 1BKD. The left picture shows theresult of above steps.

Mentions: Step 4: All of residues on the plane will then be put into a circular grid, whichconsists of ten rings: a central ring, the second ring with 6 sectors, the thirdring with 12 sectors, and the later rings, which are added to four sectors inorder. As for the radius of each ring, it is an arbitrary parameter in our program,but we complete a small calculation on it to obtain its proper value. For eachrecognition complex, we calculate the center of all residues on binding sitesand then find out the longest distance from the center for each complex. Next,we average the longest distances and divide the result by 10. Finally, we doublethe average as a radius. Therefore, the radius of each ring is 10 Å. After that,we draw a central ring with the radius from the center, the second ring withdouble radius from the center, and so on. The radian of a sector of each ring (ri)has the formula as follows: The radian of a sector of each ring = 2 * PI /riwhere ri = {1, 6, 12, 16, 20, 24, 28, 32, 36 , 40}, PI = 3.1415926535. Figure 2 illustrates the partitioning of protein complex 1BKD into circular sectors.Step 5: Finishing the above work, we refer each sector as a transaction record.A transaction record is a data mining term, which is also called an itemset. Inthis study a transaction is the set of AAs in a sector on the binding sites, likethe transaction X = {R_leu, L_asp, …}. In the transaction, we add a prefix to anitem (i.e., an AA). Prefix L is added to the AAs on the convex side of theprotein complex; and prefix R is for the concave side. After we retrieve thesetransactions from each sector, there are total 463 transactions, which consist of78 recognition complexes. An example of an itemset generated from a proteincomplex is shown in Figure 3.


Discovering amino acid patterns on binding sites in protein complexes.

Kuo HC, Ong PL, Lin JC, Huang JP - Bioinformation (2011)

The illustration of protein complex 1BKD. The left picture shows theresult of above steps.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3064845&req=5

Figure 2: The illustration of protein complex 1BKD. The left picture shows theresult of above steps.
Mentions: Step 4: All of residues on the plane will then be put into a circular grid, whichconsists of ten rings: a central ring, the second ring with 6 sectors, the thirdring with 12 sectors, and the later rings, which are added to four sectors inorder. As for the radius of each ring, it is an arbitrary parameter in our program,but we complete a small calculation on it to obtain its proper value. For eachrecognition complex, we calculate the center of all residues on binding sitesand then find out the longest distance from the center for each complex. Next,we average the longest distances and divide the result by 10. Finally, we doublethe average as a radius. Therefore, the radius of each ring is 10 Å. After that,we draw a central ring with the radius from the center, the second ring withdouble radius from the center, and so on. The radian of a sector of each ring (ri)has the formula as follows: The radian of a sector of each ring = 2 * PI /riwhere ri = {1, 6, 12, 16, 20, 24, 28, 32, 36 , 40}, PI = 3.1415926535. Figure 2 illustrates the partitioning of protein complex 1BKD into circular sectors.Step 5: Finishing the above work, we refer each sector as a transaction record.A transaction record is a data mining term, which is also called an itemset. Inthis study a transaction is the set of AAs in a sector on the binding sites, likethe transaction X = {R_leu, L_asp, …}. In the transaction, we add a prefix to anitem (i.e., an AA). Prefix L is added to the AAs on the convex side of theprotein complex; and prefix R is for the concave side. After we retrieve thesetransactions from each sector, there are total 463 transactions, which consist of78 recognition complexes. An example of an itemset generated from a proteincomplex is shown in Figure 3.

Bottom Line: As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user.As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest.Then we used the association rules to mine these records for discovering relationships.

View Article: PubMed Central - PubMed

ABSTRACT
Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes.

No MeSH data available.