Limits...
Similarity search for local protein structures at atomic resolution by exploiting a database management system

View Article: PubMed Central - PubMed

ABSTRACT

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.

No MeSH data available.


Overview of the method. The left part (“Compiling database”) illustrates the pre-processing step. The right part (“Searching”) shows the search step for a given protein structure as a query.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC5036654&req=5

f1-3_75: Overview of the method. The left part (“Compiling database”) illustrates the pre-processing step. The right part (“Searching”) shows the search step for a given protein structure as a query.

Mentions: We first extract ligand binding sites (templates) from PDBML files12 and save them in XML files called LBSML (Ligand Binding Site Markup Language) files. An LBSML file contains information of atoms that are in contact with a ligand, along with reference sets (refsets) for local coordinate systems (see below). Then we compile refsets and atomic coordinates in local coordinate systems into a set of relational database (RDB) tables. This is a pre-processing stage and is carried out only once as long as we do not need to update the database (Fig. 1, left part).


Similarity search for local protein structures at atomic resolution by exploiting a database management system
Overview of the method. The left part (“Compiling database”) illustrates the pre-processing step. The right part (“Searching”) shows the search step for a given protein structure as a query.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC5036654&req=5

f1-3_75: Overview of the method. The left part (“Compiling database”) illustrates the pre-processing step. The right part (“Searching”) shows the search step for a given protein structure as a query.
Mentions: We first extract ligand binding sites (templates) from PDBML files12 and save them in XML files called LBSML (Ligand Binding Site Markup Language) files. An LBSML file contains information of atoms that are in contact with a ligand, along with reference sets (refsets) for local coordinate systems (see below). Then we compile refsets and atomic coordinates in local coordinate systems into a set of relational database (RDB) tables. This is a pre-processing stage and is carried out only once as long as we do not need to update the database (Fig. 1, left part).

View Article: PubMed Central - PubMed

ABSTRACT

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.

No MeSH data available.