Limits...
TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH
The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see text). (b) The alignment of secondary structure elements from TOPS+ comparison is used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an "eligible" block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553092&req=5

Figure 4: The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see text). (b) The alignment of secondary structure elements from TOPS+ comparison is used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an "eligible" block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining.

Mentions: In this work, we want to test the general idea of pruning the search space of the FATCAT comparison process using topological constraints derived from the TOPS+ strings alignment. Many of the AFPs considered in the FATCAT alignment could be easily eliminated from the comparison by constraining the alignment region. Here we explore constraints obtained from the TOPS+ strings alignment, which identifies topologically equivalent secondary structure elements (alpha helices, beta strands, and loops) for this purpose. Such equivalences define blocks that restrict the alignment region; AFPs that fall outside these regions are simply not considered (see Figure 4(b)). We introduce a parameter r to control the strictness of constraints by TOPS+ strings alignments; r equals 0 if the alignment region is strictly restrained by TOPS+ strings alignment, and r is set to 1 by default in our program to allow certain flexibility to the constrained alignment region (Figure 4(c)). We then can speed up the FATCAT alignment by considering only the AFPs within the constrained alignment area (Figure 4(d)). The rigid structural alignment can be treated as a special case of TOPS++FATCAT, in which no twist is allowed in chaining AFPs. However, the TOPS++FATCAT program provides alignment in both, "rigid" mode and "flexible" mode (default).


TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see text). (b) The alignment of secondary structure elements from TOPS+ comparison is used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an "eligible" block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553092&req=5

Figure 4: The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see text). (b) The alignment of secondary structure elements from TOPS+ comparison is used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an "eligible" block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining.
Mentions: In this work, we want to test the general idea of pruning the search space of the FATCAT comparison process using topological constraints derived from the TOPS+ strings alignment. Many of the AFPs considered in the FATCAT alignment could be easily eliminated from the comparison by constraining the alignment region. Here we explore constraints obtained from the TOPS+ strings alignment, which identifies topologically equivalent secondary structure elements (alpha helices, beta strands, and loops) for this purpose. Such equivalences define blocks that restrict the alignment region; AFPs that fall outside these regions are simply not considered (see Figure 4(b)). We introduce a parameter r to control the strictness of constraints by TOPS+ strings alignments; r equals 0 if the alignment region is strictly restrained by TOPS+ strings alignment, and r is set to 1 by default in our program to allow certain flexibility to the constrained alignment region (Figure 4(c)). We then can speed up the FATCAT alignment by considering only the AFPs within the constrained alignment area (Figure 4(d)). The rigid structural alignment can be treated as a special case of TOPS++FATCAT, in which no twist is allowed in chaining AFPs. However, the TOPS++FATCAT program provides alignment in both, "rigid" mode and "flexible" mode (default).

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH