Limits...
TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison.

Razmara J, Deris S, Parvizpour S - Algorithms Mol Biol (2012)

Bottom Line: The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes.In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia 81310, Johor Bahru, Malaysia. razmaraj@gmail.com.

ABSTRACT

Background: In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.

Results: In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.

Conclusions: The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

No MeSH data available.


An example for matching topology string of two reference proteins with 24 permuted topology strings of query protein.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3298807&req=5

Figure 4: An example for matching topology string of two reference proteins with 24 permuted topology strings of query protein.

Mentions: The 3D-coordinates of any pair of protein structures are available in an arbitrary relative orientation, in which the matched parts may not correspond in two structures. Accordingly, the structure comparison methods need coordinates with independent representation of the structures making them comparable. In order to obtain an initial match between two structures, our method applies an algorithm with respect to the scheme introduced by Martin [18]. Figure 3 is the algorithm which has been developed for matching SSEs using topology strings. To this end, the topology string of the query protein is permuted by rotating its structure 90 degrees around the x, y and z axes (line 2 in Figure 3). For each rotation around an axis, letters of the topology string are replaced according to Table 2. Therefore, 24 different secondary topology strings are created. Then, the above introduced cross-entropy measure is utilized to compare these 24 permuted topology strings of the query protein with the topology string of the reference protein, with which the most similar strings are chosen (lines 4-8 in Figure 3). In sequel, identical n-gram words of the two topology strings are marked as matched in an iterative procedure accounting for the decreasing size of n-grams starting from m (chosen empirically 6) down to the basic size of an n-gram (chosen at 3) (lines 9-17 in Figure 3). Figure 4 represents the SSE matching procedure for two sample secondary topology strings.


TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison.

Razmara J, Deris S, Parvizpour S - Algorithms Mol Biol (2012)

An example for matching topology string of two reference proteins with 24 permuted topology strings of query protein.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3298807&req=5

Figure 4: An example for matching topology string of two reference proteins with 24 permuted topology strings of query protein.
Mentions: The 3D-coordinates of any pair of protein structures are available in an arbitrary relative orientation, in which the matched parts may not correspond in two structures. Accordingly, the structure comparison methods need coordinates with independent representation of the structures making them comparable. In order to obtain an initial match between two structures, our method applies an algorithm with respect to the scheme introduced by Martin [18]. Figure 3 is the algorithm which has been developed for matching SSEs using topology strings. To this end, the topology string of the query protein is permuted by rotating its structure 90 degrees around the x, y and z axes (line 2 in Figure 3). For each rotation around an axis, letters of the topology string are replaced according to Table 2. Therefore, 24 different secondary topology strings are created. Then, the above introduced cross-entropy measure is utilized to compare these 24 permuted topology strings of the query protein with the topology string of the reference protein, with which the most similar strings are chosen (lines 4-8 in Figure 3). In sequel, identical n-gram words of the two topology strings are marked as matched in an iterative procedure accounting for the decreasing size of n-grams starting from m (chosen empirically 6) down to the basic size of an n-gram (chosen at 3) (lines 9-17 in Figure 3). Figure 4 represents the SSE matching procedure for two sample secondary topology strings.

Bottom Line: The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes.In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia 81310, Johor Bahru, Malaysia. razmaraj@gmail.com.

ABSTRACT

Background: In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.

Results: In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.

Conclusions: The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

No MeSH data available.