Similarity search for local protein structures at atomic resolution by exploiting a database management system
View Article:
PubMed Central - PubMed
ABSTRACT
A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins. No MeSH data available. |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC5036654&req=5
Mentions: Let ri (i=0, ..., 3) be the coordinates of the four atoms of a refset (tetrahedron) in the original coordinate system (i.e., as in the PDB file). Here, the indices from 0 to 3 are so labeled in the lexicographical order of their atom types. When calculating the local coordinates of an atom in the refset, the origin is set to r0. The x-axis is defined by the unit vector parallel to r01 ≡r1–r0, that is, χ̄ ≡ (1///r01//)r01. With r02≡r2–r0, the y-axis is defined by ŷ ≡ (1///r02//) ○× r02. The z-axis is defined by ẑ≡○ × ŷ. Thus, for a given set of coordinates s in the original system, the local coordinates in the system spanned by the refsets {ri} are given as s′=[(s–r0) ·○, (s–r0) · ŷ, (s–r0) · ẑ]. This coordinate system spanned by a refset is illustrated in Figure 2. Using these notations, the definition of the chirality of a tetrahedron mentioned above is given as the sign of the dot product r03·ŷ. For example, the chirality of the tetrahedron in Figure 2 is positive. By explicitly including the chirality information, it is possible to discriminate the enantiomers for query and template structures. Therefore, for a given query structure, we can always find the templates of the correct chirality in the search stage described below. |
View Article: PubMed Central - PubMed
A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.
No MeSH data available.