Limits...
TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison.

Razmara J, Deris S, Parvizpour S - Algorithms Mol Biol (2012)

Bottom Line: The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes.In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia 81310, Johor Bahru, Malaysia. razmaraj@gmail.com.

ABSTRACT

Background: In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.

Results: In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.

Conclusions: The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

No MeSH data available.


Average precision-recall for searching 108 query proteins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3298807&req=5

Figure 6: Average precision-recall for searching 108 query proteins.

Mentions: where m is the number of correct retrievals from the same SCOP family, n is the total number of retrieved proteins, and N denotes the total number of relevant proteins within the same SCOP family. The experiment uses the above dataset to search 108 query proteins, and the precision and recall values are evaluated for each method. The average precision-recall values calculated for the six different schemes are shown in Figure 6. The results of the methods except for TM-align and TS-AMIR were taken from the literature [22]. According to the figure, TM-align is in the first rank in terms of accuracy. In the second rank, TS-AMIR and CE are competitively accurate, although TS-AMIR obtains slightly better accuracy in the higher percentages. Moreover, three linear encoding schemes obtain generally lower accuracy than TS-AMIR.


TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison.

Razmara J, Deris S, Parvizpour S - Algorithms Mol Biol (2012)

Average precision-recall for searching 108 query proteins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3298807&req=5

Figure 6: Average precision-recall for searching 108 query proteins.
Mentions: where m is the number of correct retrievals from the same SCOP family, n is the total number of retrieved proteins, and N denotes the total number of relevant proteins within the same SCOP family. The experiment uses the above dataset to search 108 query proteins, and the precision and recall values are evaluated for each method. The average precision-recall values calculated for the six different schemes are shown in Figure 6. The results of the methods except for TM-align and TS-AMIR were taken from the literature [22]. According to the figure, TM-align is in the first rank in terms of accuracy. In the second rank, TS-AMIR and CE are competitively accurate, although TS-AMIR obtains slightly better accuracy in the higher percentages. Moreover, three linear encoding schemes obtain generally lower accuracy than TS-AMIR.

Bottom Line: The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes.In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia 81310, Johor Bahru, Malaysia. razmaraj@gmail.com.

ABSTRACT

Background: In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.

Results: In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.

Conclusions: The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.

No MeSH data available.