Limits...
TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH
Graph showing the runtime and AFP analysis of the FATCAT (in green) and TOPS++FATCAT (in red) methods based on the flexible option, (a) runtime statistics, where the x-axis indicates the 1,901 SCOP domain pairs ordered by flexible_FATCAT runtime; (b) total number of AFP statistics, where the x-axis represents the 1,901 SCOP domain pairs ordered based on AFPs from the flexible_FATCAT method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553092&req=5

Figure 6: Graph showing the runtime and AFP analysis of the FATCAT (in green) and TOPS++FATCAT (in red) methods based on the flexible option, (a) runtime statistics, where the x-axis indicates the 1,901 SCOP domain pairs ordered by flexible_FATCAT runtime; (b) total number of AFP statistics, where the x-axis represents the 1,901 SCOP domain pairs ordered based on AFPs from the flexible_FATCAT method.

Mentions: We tested both the FATCAT and TOPS++FATCAT methods using the Mac OS X version 10.4.10 computer system with a 2 × 2.66-GHz Dual-Core Intel Xeon processor and 1-GB 667 MHz memory. We have performed runtime analysis on 1,901 protein domain pairs and counted the total number of AFPs and the corresponding runtime from both the FATCAT and the TOPS++FATCAT methods. The results show an exponential increase in AFPs (Figure 6(b)) and corresponding runtime (Figure 6(a)) for the FATCAT method as compared to the TOPS++FATCAT method (see Table 3) For example, the average number of AFPs for the TOPS++FATCAT method is 530, but the average number of AFPs for the FATCAT method is 15,019. This represents the number of average AFPs used by the FATCAT method is increased by a factor of 28 (see Table 3). This result leads to the conclusion that TOPS++FATCAT is 22 times faster compared to the FATCAT because this method must take into account more number of AFPs in the comparison process (see Table 3).


TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model.

Veeramalai M, Ye Y, Godzik A - BMC Bioinformatics (2008)

Graph showing the runtime and AFP analysis of the FATCAT (in green) and TOPS++FATCAT (in red) methods based on the flexible option, (a) runtime statistics, where the x-axis indicates the 1,901 SCOP domain pairs ordered by flexible_FATCAT runtime; (b) total number of AFP statistics, where the x-axis represents the 1,901 SCOP domain pairs ordered based on AFPs from the flexible_FATCAT method.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553092&req=5

Figure 6: Graph showing the runtime and AFP analysis of the FATCAT (in green) and TOPS++FATCAT (in red) methods based on the flexible option, (a) runtime statistics, where the x-axis indicates the 1,901 SCOP domain pairs ordered by flexible_FATCAT runtime; (b) total number of AFP statistics, where the x-axis represents the 1,901 SCOP domain pairs ordered based on AFPs from the flexible_FATCAT method.
Mentions: We tested both the FATCAT and TOPS++FATCAT methods using the Mac OS X version 10.4.10 computer system with a 2 × 2.66-GHz Dual-Core Intel Xeon processor and 1-GB 667 MHz memory. We have performed runtime analysis on 1,901 protein domain pairs and counted the total number of AFPs and the corresponding runtime from both the FATCAT and the TOPS++FATCAT methods. The results show an exponential increase in AFPs (Figure 6(b)) and corresponding runtime (Figure 6(a)) for the FATCAT method as compared to the TOPS++FATCAT method (see Table 3) For example, the average number of AFPs for the TOPS++FATCAT method is 530, but the average number of AFPs for the FATCAT method is 15,019. This represents the number of average AFPs used by the FATCAT method is increased by a factor of 28 (see Table 3). This result leads to the conclusion that TOPS++FATCAT is 22 times faster compared to the FATCAT because this method must take into account more number of AFPs in the comparison process (see Table 3).

Bottom Line: Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements).We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA. mallikav@burnham.org

ABSTRACT

Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.

Results: We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.

Software availability: The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: http://fatcat.burnham.org/TOPS/ CONCLUSION: TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.

Show MeSH