Limits...
TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH
(a) Superposition of d2trxa_(gray) and d1kte__(orange) from flexible_FATCAT and d1kte__(blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553092&req=5

Figure 7: (a) Superposition of d2trxa_(gray) and d1kte__(orange) from flexible_FATCAT and d1kte__(blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT.

Mentions: While the overall accuracy of both rigid and flexible FATCAT methods is better than their TOPS++FATCAT equivalents, an interesting example where the opposite is true lies in the comparison of two proteins, d2trxa_ (108 aa) from Escherichia coli and d1kte__ (105 aa) from Sus scrofa (pig) from the thioredoxin-like superfamily. For this pair, the flexible_TOPS++FATCAT method provides an alignment with 88 equivalent positions with 1.67 Å chain RMSD and 3.06 Å of optimal RMSD without any twist, giving the alignment with 10% sequence identity (see Table 4). On the other hand, the flexible_FATCAT method provides an alignment with 86 aligned positions using a twist in the C-terminal region; it has a higher chain RMSD of 5.14 Å, and its optimal RMSD is 3.48 Å. For more information regarding the chain and optimal RMSDs refer [5]. The flexible_FATCAT method uses the twist to align a helix in the C-terminal region, which is positioned incorrectly with a beta-sheet core (see Table 4). Figure 7(a) shows the superposition of d2trxa_ (gray) and d1kte__ (orange) domains from the flexible_FATCAT method, where the blue color indicates the d1kte__ protein domain from the flexible_TOPS++FATCAT method. The incorrect alignment of the C-terminal domain alpha helix of the d1kte__ domain (orange) is visible in the core of the beta-sheet region. Figure 7(b) and 7(c) shows the AFPs from the flexible_FATCAT and flexible_TOPS++FATCAT methods, respectively. The hinge region provides a twist in the flexible_FATCAT method indicated by an arrow and the AFPs represented by a different color (see Figure 7(b)). In this case, the alignment constraints from the TOPS+ strings alignment allow the TOPS++FATCAT method to avoid a spurious alignment.


TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

(a) Superposition of d2trxa_(gray) and d1kte__(orange) from flexible_FATCAT and d1kte__(blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553092&req=5

Figure 7: (a) Superposition of d2trxa_(gray) and d1kte__(orange) from flexible_FATCAT and d1kte__(blue) from flexible_TOPS++FATCAT; (b) AFP chaining alignment from flexible_FATCAT; (c) AFP chaining alignment from flexible_TOPS++FATCAT.
Mentions: While the overall accuracy of both rigid and flexible FATCAT methods is better than their TOPS++FATCAT equivalents, an interesting example where the opposite is true lies in the comparison of two proteins, d2trxa_ (108 aa) from Escherichia coli and d1kte__ (105 aa) from Sus scrofa (pig) from the thioredoxin-like superfamily. For this pair, the flexible_TOPS++FATCAT method provides an alignment with 88 equivalent positions with 1.67 Å chain RMSD and 3.06 Å of optimal RMSD without any twist, giving the alignment with 10% sequence identity (see Table 4). On the other hand, the flexible_FATCAT method provides an alignment with 86 aligned positions using a twist in the C-terminal region; it has a higher chain RMSD of 5.14 Å, and its optimal RMSD is 3.48 Å. For more information regarding the chain and optimal RMSDs refer [5]. The flexible_FATCAT method uses the twist to align a helix in the C-terminal region, which is positioned incorrectly with a beta-sheet core (see Table 4). Figure 7(a) shows the superposition of d2trxa_ (gray) and d1kte__ (orange) domains from the flexible_FATCAT method, where the blue color indicates the d1kte__ protein domain from the flexible_TOPS++FATCAT method. The incorrect alignment of the C-terminal domain alpha helix of the d1kte__ domain (orange) is visible in the core of the beta-sheet region. Figure 7(b) and 7(c) shows the AFPs from the flexible_FATCAT and flexible_TOPS++FATCAT methods, respectively. The hinge region provides a twist in the flexible_FATCAT method indicated by an arrow and the AFPs represented by a different color (see Figure 7(b)). In this case, the alignment constraints from the TOPS+ strings alignment allow the TOPS++FATCAT method to avoid a spurious alignment.

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH