Limits...
TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH
Different representations of the protein structure flavodoxin-fold CheY: (a) ribbon diagram; (b) TOPS style topology diagram; (c) distance; (d) contact map.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553092&req=5

Figure 1: Different representations of the protein structure flavodoxin-fold CheY: (a) ribbon diagram; (b) TOPS style topology diagram; (c) distance; (d) contact map.

Mentions: Comparison and classification of protein structures is significantly simplified by the fact that proteins have naturally modular structures, being mostly composed of locally regular structures: alpha helices and beta strands. These two types of secondary structures constitute a little over 50% of an average protein's length. With the average length of a secondary structure being around 10 amino acids, this makes it possible to describe protein structure as an arrangement of a much smaller number of elements. Protein structures are often visualized in a simplified form, with the so-called ribbon diagram with secondary structures shown as helices and arrows being the most popular (see Figure 1). This picture can be simplified further by showing individual secondary structure elements as simple symbols (circles or boxes/triangles). These depictions, called fold diagrams, originally proposed in the 70s [10-12] are best captured by a TOPS (Topology of Protein Structures) algorithm, which attempts to automate the process of creation of the topology cartoon [13]. While useful in protein classification, such simplified descriptions are not used in the most popular automated protein structure comparison algorithms such as DALI [3] or CE [4]. Kleywegt and Jones developed a method for finding similar motifs based on comparing distance matrices that are constructed by representing protein as a set of SSEs with their directional vectors and angle between those vectors [14]. Programs that used SSEs either for structure comparison based on hierarchical superposition of both SSEs and atomic representation [15] or for finding common substructures in the comparison process based on subgraph isomorphism, such as [16,17] and recent applications of the TOPS diagram [18,19], usually struggle with translating the comparison results from the secondary structure to the individual residue level. Although the SSM method uses graph-matching procedures at the SSE level followed by an interactive 3D alignment of the protein C-alpha atom [20], it lacks the topological relationships between the SSEs, which are essential features in identifying common scaffolds in distantly related proteins. A TOPS pattern was used to guide the sequence alignment, for instance, to build multiple structural alignments of the distantly related family of beta-rich protein domains [21]. The Multiple Sequence Alignment Tool (MSAT) automates this approach, merging it with a popular ClustalW program [22]. DALI [3], CE [4] or FATCAT [5] introduce their own methods of decomposing the protein structure into smaller units, such as 7 × 7 dense distance map fragments (DALIs) or aligned fragment pairs (AFPs) (CE and FATCAT). The large number of such fragments and the combinations of the fragments that need to be evaluated by structure comparison programs is the main reason for the significant computational requirements of such algorithms. However, more importantly, TOPS+ method is used here to enable a structural comparison that takes into account flexibility in protein structures and not only classifies the differences, but also can recognize such rearrangements – which is a first such application using the SSEs language. In this contribution, we explore the question of whether it would be possible to combine insights provided by topology diagrams into automated protein structure alignment algorithms, focusing on the FATCAT program developed previously in our group.


TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Different representations of the protein structure flavodoxin-fold CheY: (a) ribbon diagram; (b) TOPS style topology diagram; (c) distance; (d) contact map.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553092&req=5

Figure 1: Different representations of the protein structure flavodoxin-fold CheY: (a) ribbon diagram; (b) TOPS style topology diagram; (c) distance; (d) contact map.
Mentions: Comparison and classification of protein structures is significantly simplified by the fact that proteins have naturally modular structures, being mostly composed of locally regular structures: alpha helices and beta strands. These two types of secondary structures constitute a little over 50% of an average protein's length. With the average length of a secondary structure being around 10 amino acids, this makes it possible to describe protein structure as an arrangement of a much smaller number of elements. Protein structures are often visualized in a simplified form, with the so-called ribbon diagram with secondary structures shown as helices and arrows being the most popular (see Figure 1). This picture can be simplified further by showing individual secondary structure elements as simple symbols (circles or boxes/triangles). These depictions, called fold diagrams, originally proposed in the 70s [10-12] are best captured by a TOPS (Topology of Protein Structures) algorithm, which attempts to automate the process of creation of the topology cartoon [13]. While useful in protein classification, such simplified descriptions are not used in the most popular automated protein structure comparison algorithms such as DALI [3] or CE [4]. Kleywegt and Jones developed a method for finding similar motifs based on comparing distance matrices that are constructed by representing protein as a set of SSEs with their directional vectors and angle between those vectors [14]. Programs that used SSEs either for structure comparison based on hierarchical superposition of both SSEs and atomic representation [15] or for finding common substructures in the comparison process based on subgraph isomorphism, such as [16,17] and recent applications of the TOPS diagram [18,19], usually struggle with translating the comparison results from the secondary structure to the individual residue level. Although the SSM method uses graph-matching procedures at the SSE level followed by an interactive 3D alignment of the protein C-alpha atom [20], it lacks the topological relationships between the SSEs, which are essential features in identifying common scaffolds in distantly related proteins. A TOPS pattern was used to guide the sequence alignment, for instance, to build multiple structural alignments of the distantly related family of beta-rich protein domains [21]. The Multiple Sequence Alignment Tool (MSAT) automates this approach, merging it with a popular ClustalW program [22]. DALI [3], CE [4] or FATCAT [5] introduce their own methods of decomposing the protein structure into smaller units, such as 7 × 7 dense distance map fragments (DALIs) or aligned fragment pairs (AFPs) (CE and FATCAT). The large number of such fragments and the combinations of the fragments that need to be evaluated by structure comparison programs is the main reason for the significant computational requirements of such algorithms. However, more importantly, TOPS+ method is used here to enable a structural comparison that takes into account flexibility in protein structures and not only classifies the differences, but also can recognize such rearrangements – which is a first such application using the SSEs language. In this contribution, we explore the question of whether it would be possible to combine insights provided by topology diagrams into automated protein structure alignment algorithms, focusing on the FATCAT program developed previously in our group.

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH