Limits...
Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

Mrozek D, Brożek M, Małysiak-Mrozek B - J Mol Model (2014)

Bottom Line: Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can.The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores).In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time.

View Article: PubMed Central - PubMed

Affiliation: Institute of Informatics, Silesian University of Technology, Gliwice, Poland, dariusz.mrozek@polsl.pl.

ABSTRACT
Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

Show MeSH

Related in: MedlinePlus

Encoding the reduced chain of secondary structure for query protein Q (left) and constructing the query profile (right). The query profile shows all possible (encoded) scores when comparing the reduced query chain of secondary structure to SE regions from candidate protein structures from the database
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3936136&req=5

Fig8: Encoding the reduced chain of secondary structure for query protein Q (left) and constructing the query profile (right). The query profile shows all possible (encoded) scores when comparing the reduced query chain of secondary structure to SE regions from candidate protein structures from the database

Mentions: The data package for the query chain of secondary structures is built on the basis of a slightly different principle. If it was created in the same way as the data packages for database structures, then in order to extract the similarity coefficient of secondary structures σi,j we would have to read the cell (SSEiA, SSEjB) from a predefined matrix of coefficients (a kind of substitution matrix constructed based on rules (i)–(iii) in the “Introduction” section), which would affect performance negatively. We can avoid this by pre-computing and writing all possible similarity coefficients directly into the data package of the query protein, creating something like the query-specific substitution matrix proposed in [40] and called a query profile in the GPU-based alignment algorithm for sequence similarity presented in [28]. Therefore, the data package for the query protein passes through an additional preparation step. For each SE region, four versions of the similarity coefficient are created, one for each of the secondary structure types and one for the neutral element 0 (as shown in Fig. 8). In the query profile created, the row index is defined by the index of the structural region SE divided by 2, and the column index is defined by the type of secondary structure present (with the additional neutral element 0). The coefficients are converted to integers in order to fit them into 1 byte, according to the following rules:Fig. 8


Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

Mrozek D, Brożek M, Małysiak-Mrozek B - J Mol Model (2014)

Encoding the reduced chain of secondary structure for query protein Q (left) and constructing the query profile (right). The query profile shows all possible (encoded) scores when comparing the reduced query chain of secondary structure to SE regions from candidate protein structures from the database
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3936136&req=5

Fig8: Encoding the reduced chain of secondary structure for query protein Q (left) and constructing the query profile (right). The query profile shows all possible (encoded) scores when comparing the reduced query chain of secondary structure to SE regions from candidate protein structures from the database
Mentions: The data package for the query chain of secondary structures is built on the basis of a slightly different principle. If it was created in the same way as the data packages for database structures, then in order to extract the similarity coefficient of secondary structures σi,j we would have to read the cell (SSEiA, SSEjB) from a predefined matrix of coefficients (a kind of substitution matrix constructed based on rules (i)–(iii) in the “Introduction” section), which would affect performance negatively. We can avoid this by pre-computing and writing all possible similarity coefficients directly into the data package of the query protein, creating something like the query-specific substitution matrix proposed in [40] and called a query profile in the GPU-based alignment algorithm for sequence similarity presented in [28]. Therefore, the data package for the query protein passes through an additional preparation step. For each SE region, four versions of the similarity coefficient are created, one for each of the secondary structure types and one for the neutral element 0 (as shown in Fig. 8). In the query profile created, the row index is defined by the index of the structural region SE divided by 2, and the column index is defined by the type of secondary structure present (with the additional neutral element 0). The coefficients are converted to integers in order to fit them into 1 byte, according to the following rules:Fig. 8

Bottom Line: Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can.The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores).In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time.

View Article: PubMed Central - PubMed

Affiliation: Institute of Informatics, Silesian University of Technology, Gliwice, Poland, dariusz.mrozek@polsl.pl.

ABSTRACT
Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

Show MeSH
Related in: MedlinePlus