Limits...
TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH
(a) Superposition of d1eca__ (gray) and d1cpca_ (orange) from flexible_FATCAT and d1cpca__ (blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT; (d) structural alignment from flexible_TOPS++FATCAT; (e) structural alignment from flexible_FATCAT.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553092&req=5

Figure 8: (a) Superposition of d1eca__ (gray) and d1cpca_ (orange) from flexible_FATCAT and d1cpca__ (blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT; (d) structural alignment from flexible_TOPS++FATCAT; (e) structural alignment from flexible_FATCAT.

Mentions: The Erythrocruorin protein domain d1eca__ (136 aa) from Chironomus thummi and the Phycocyanin alpha subunit protein domain d1cpca_ (162 aa) from Fremyella diplosiphon (Cyanobacterium) belong to the Globin-like superfamily. For these protein domain pairs, the FATCAT method provides a better alignment with 120 and 118 aligned positions with the chain RMSD of 4.02 Å based on the flexible and rigid options, respectively. The flexible_TOPS++FATCAT method gives an alignment of 63 aligned positions with the 3.23 Å optimal RMSD and the 6.28 Å chain RMSD. In this case, the flexible_TOPS++FATCAT method misses the N-terminal region helix and misaligns some helices. For example, Figure 8(a) shows the superposition of d1eca__ (gray) and d1cpca_ (orange) domains from the flexible_FATCAT method, while d1cpca_ (blue) domain is from the flexible_TOPS++FATCAT method. The AFP chaining alignment and the actual alignment from FATCAT are shown in Figure 8(b) and 8(e), respectively. Figure 8(c) shows the AFP alignment from TOPS++FATCAT, in which this method misses the N-terminal region and incorrectly aligns some of the C-terminal regions (see Figure 8(d)). However, the rigid_TOPS++FATCAT method produces an alignment of 108 aligned positions with optimal and chain RMSDs of 3.22 Å and 6.28 Å respectively. In general, TOPS comparison does not work well for alpha-rich proteins due to the lack of hydrogen bonds between SSEs [26]. The same is true for TOPS+ strings comparison to some extent; however, this method takes advantage of ligand-interaction information to compare protein domains more efficiently; for example the DNA binding motifs such as helix-turn-helix and helix-loop-helix can be easily recognized [28]. However, we have not explored that ligand pattern discovery option within the TOPS+ strings comparison in this paper. In addition, the TOPS+ strings alignment provides only a basic alignment; the scoring function to find the best alignment has not been optimized. These problems can be addressed in future development by considering the advanced TOPS+ and TOPS+ strings models based on helix-helix packing relationships and SSE-ligand interaction properties together with the right and left chiralities. Furthermore, the TOPS+ strings comparison can be optimized in both the comparison process as well as in the alignment process in order to take into account indels (insertion/deletion) of SSEs which exist in nature across the different members of the protein superfamilies [31].


TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

(a) Superposition of d1eca__ (gray) and d1cpca_ (orange) from flexible_FATCAT and d1cpca__ (blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT; (d) structural alignment from flexible_TOPS++FATCAT; (e) structural alignment from flexible_FATCAT.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553092&req=5

Figure 8: (a) Superposition of d1eca__ (gray) and d1cpca_ (orange) from flexible_FATCAT and d1cpca__ (blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT; (d) structural alignment from flexible_TOPS++FATCAT; (e) structural alignment from flexible_FATCAT.
Mentions: The Erythrocruorin protein domain d1eca__ (136 aa) from Chironomus thummi and the Phycocyanin alpha subunit protein domain d1cpca_ (162 aa) from Fremyella diplosiphon (Cyanobacterium) belong to the Globin-like superfamily. For these protein domain pairs, the FATCAT method provides a better alignment with 120 and 118 aligned positions with the chain RMSD of 4.02 Å based on the flexible and rigid options, respectively. The flexible_TOPS++FATCAT method gives an alignment of 63 aligned positions with the 3.23 Å optimal RMSD and the 6.28 Å chain RMSD. In this case, the flexible_TOPS++FATCAT method misses the N-terminal region helix and misaligns some helices. For example, Figure 8(a) shows the superposition of d1eca__ (gray) and d1cpca_ (orange) domains from the flexible_FATCAT method, while d1cpca_ (blue) domain is from the flexible_TOPS++FATCAT method. The AFP chaining alignment and the actual alignment from FATCAT are shown in Figure 8(b) and 8(e), respectively. Figure 8(c) shows the AFP alignment from TOPS++FATCAT, in which this method misses the N-terminal region and incorrectly aligns some of the C-terminal regions (see Figure 8(d)). However, the rigid_TOPS++FATCAT method produces an alignment of 108 aligned positions with optimal and chain RMSDs of 3.22 Å and 6.28 Å respectively. In general, TOPS comparison does not work well for alpha-rich proteins due to the lack of hydrogen bonds between SSEs [26]. The same is true for TOPS+ strings comparison to some extent; however, this method takes advantage of ligand-interaction information to compare protein domains more efficiently; for example the DNA binding motifs such as helix-turn-helix and helix-loop-helix can be easily recognized [28]. However, we have not explored that ligand pattern discovery option within the TOPS+ strings comparison in this paper. In addition, the TOPS+ strings alignment provides only a basic alignment; the scoring function to find the best alignment has not been optimized. These problems can be addressed in future development by considering the advanced TOPS+ and TOPS+ strings models based on helix-helix packing relationships and SSE-ligand interaction properties together with the right and left chiralities. Furthermore, the TOPS+ strings comparison can be optimized in both the comparison process as well as in the alignment process in order to take into account indels (insertion/deletion) of SSEs which exist in nature across the different members of the protein superfamilies [31].

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH