Limits...
Similarity search for local protein structures at atomic resolution by exploiting a database management system

View Article: PubMed Central - PubMed

ABSTRACT

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.

No MeSH data available.


Optimal superpositions of the ATP-binding sites of the query cAMP-dependent protein kinase (cAPK; PDB ID: 1atp26) on templates. A: The template is the ATP-binding site of casein kinase-1 (PDB ID: 1csn29) from Schizosaccharomyces pombe. B: The template is the ATP-binding site of glutathion synthetase (PDB ID: 1m0w30) from Saccharomyces cerevisiae. The color scheme is the same as Fig. 6. The ligand of 1atp is also shown in the stick model with the CPK colors.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC5036654&req=5

f7-3_75: Optimal superpositions of the ATP-binding sites of the query cAMP-dependent protein kinase (cAPK; PDB ID: 1atp26) on templates. A: The template is the ATP-binding site of casein kinase-1 (PDB ID: 1csn29) from Schizosaccharomyces pombe. B: The template is the ATP-binding site of glutathion synthetase (PDB ID: 1m0w30) from Saccharomyces cerevisiae. The color scheme is the same as Fig. 6. The ligand of 1atp is also shown in the stick model with the CPK colors.

Mentions: Our third example is the cAMP-dependent protein kinase, cAPK (PDB ID: 1atp26) from Mus musculus. This example is motivated by the work of Kobayashi and Go27 where they have found that the local structure of the nucleotide-binding site of cAPK is similar to those of other nucleotide-binding proteins with different folds. They listed five ATP-binding proteins that share similar local structures: glutaminyl-tRNA synthetase, D-Ala:D-Ala ligase (DD-ligase), casein kinase-1 (CK-1), seryl-tRNA synthetase, and glutamine synthetase27. According to the SCOP database28, CK-1 and cAPK belong to the same family, the protein kinase catalytic subunit family, although the sequence identity between them is as low as 19%. Among the five proteins listed by Kobayashi and Go, CK-1 exhibited a highly significant similarity with an IR score of 42.8 and P=8.9×10−11 (Fig. 7A). In contrast, we only found a weak similarity with glutathion synthetase, belonging to the same superfamily as DD-ligase, with a relatively low IR score of 12.5 (P=2.1×10−3; Fig. 7B). Most high-scoring templates were all kinases of the same fold. Other similarities listed by Kobayashi and Go were either not detected, or detected with wrong alignments. There are at least two possible explanations for this failure in detecting similar local structures. First, our criteria for selecting similar refsets may be too stringent so that possible hits are discarded during the GI search. Second, the number of aligned atoms as obtained by Kobayashi and Go is very small, ranging from 14 to 16, whereas some of obvious false hits contained more than 20 aligned atoms. The first point may be corrected by loosening the criteria at the cost of increased CPU time. The second point is more problematic, however. Kobayashi and Go used only ATP-binding proteins for their study while we used all the ligand-binding sites present in the current PDB. Accordingly, the signal-to-noise ratio is substantially lower in the present case. In order to overcome this problem, a more elaborate statistical method may be necessary.


Similarity search for local protein structures at atomic resolution by exploiting a database management system
Optimal superpositions of the ATP-binding sites of the query cAMP-dependent protein kinase (cAPK; PDB ID: 1atp26) on templates. A: The template is the ATP-binding site of casein kinase-1 (PDB ID: 1csn29) from Schizosaccharomyces pombe. B: The template is the ATP-binding site of glutathion synthetase (PDB ID: 1m0w30) from Saccharomyces cerevisiae. The color scheme is the same as Fig. 6. The ligand of 1atp is also shown in the stick model with the CPK colors.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC5036654&req=5

f7-3_75: Optimal superpositions of the ATP-binding sites of the query cAMP-dependent protein kinase (cAPK; PDB ID: 1atp26) on templates. A: The template is the ATP-binding site of casein kinase-1 (PDB ID: 1csn29) from Schizosaccharomyces pombe. B: The template is the ATP-binding site of glutathion synthetase (PDB ID: 1m0w30) from Saccharomyces cerevisiae. The color scheme is the same as Fig. 6. The ligand of 1atp is also shown in the stick model with the CPK colors.
Mentions: Our third example is the cAMP-dependent protein kinase, cAPK (PDB ID: 1atp26) from Mus musculus. This example is motivated by the work of Kobayashi and Go27 where they have found that the local structure of the nucleotide-binding site of cAPK is similar to those of other nucleotide-binding proteins with different folds. They listed five ATP-binding proteins that share similar local structures: glutaminyl-tRNA synthetase, D-Ala:D-Ala ligase (DD-ligase), casein kinase-1 (CK-1), seryl-tRNA synthetase, and glutamine synthetase27. According to the SCOP database28, CK-1 and cAPK belong to the same family, the protein kinase catalytic subunit family, although the sequence identity between them is as low as 19%. Among the five proteins listed by Kobayashi and Go, CK-1 exhibited a highly significant similarity with an IR score of 42.8 and P=8.9×10−11 (Fig. 7A). In contrast, we only found a weak similarity with glutathion synthetase, belonging to the same superfamily as DD-ligase, with a relatively low IR score of 12.5 (P=2.1×10−3; Fig. 7B). Most high-scoring templates were all kinases of the same fold. Other similarities listed by Kobayashi and Go were either not detected, or detected with wrong alignments. There are at least two possible explanations for this failure in detecting similar local structures. First, our criteria for selecting similar refsets may be too stringent so that possible hits are discarded during the GI search. Second, the number of aligned atoms as obtained by Kobayashi and Go is very small, ranging from 14 to 16, whereas some of obvious false hits contained more than 20 aligned atoms. The first point may be corrected by loosening the criteria at the cost of increased CPU time. The second point is more problematic, however. Kobayashi and Go used only ATP-binding proteins for their study while we used all the ligand-binding sites present in the current PDB. Accordingly, the signal-to-noise ratio is substantially lower in the present case. In order to overcome this problem, a more elaborate statistical method may be necessary.

View Article: PubMed Central - PubMed

ABSTRACT

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.

No MeSH data available.