Limits...
Multiple structure alignment with msTALI.

Shealy P, Valafar H - BMC Bioinformatics (2012)

Bottom Line: Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification.We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications.In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.

ABSTRACT

Background: Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Cα atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems.

Results: msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion.

Conclusions: msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali.

Show MeSH

Related in: MedlinePlus

An illustration of the protein cores derived from the CATH phylogenetic tree. Protein cores extracted from the tree in Figure6 by cutting the tree at various levels. The domains shown are: (a) 1w4e and 1w4i (b) those domains shown in (a) and 2 eq9 and 1w85 (c) those domains shown in (b) and 1w4h, 1zwv, and 1zy8 (d) those domains shown in (c) and 1nvm and 2fna. Groups (a) through (c) are from class 4.10.320.10, while the domains included in (d) are from domain 1.10.8.60.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473313&req=5

Figure 8: An illustration of the protein cores derived from the CATH phylogenetic tree. Protein cores extracted from the tree in Figure6 by cutting the tree at various levels. The domains shown are: (a) 1w4e and 1w4i (b) those domains shown in (a) and 2 eq9 and 1w85 (c) those domains shown in (b) and 1w4h, 1zwv, and 1zy8 (d) those domains shown in (c) and 1nvm and 2fna. Groups (a) through (c) are from class 4.10.320.10, while the domains included in (d) are from domain 1.10.8.60.

Mentions: The splitting of superfamily 1.10.8.60 occurs because the 4.10.320.10 domains have a strong core in common with those from 1.10.8.60, dividing 1.10.8.60 into two clusters. This splitting is illustrated in Figure5. The larger 1.10.8.60 cluster contains 94% of that superfamily’s domains, and so deviations from CATH relate to the smaller 1.10.8.60 cluster. We examined the branches containing the 4.10.320.10 cluster and the smaller 1.10.8.60 cluster in more detail as shown in Figure7. The protein cores created by cutting this portion of the phylogenetic tree at varying levels of similarity are shown in Figure8. As more domains are incorporated into the core, some regions exhibit structural diversity, while others are nearly identical between domains. The regions of diversity are almost exclusively located in turns. It is remarkable that domains from the two classes contain substantial overlap between their cores. The core sizes are 42, 40, 39, and 40 residues for the cores labelled (a), (b), (c), and (d). The nine structures used to generate these cores range in size from 40 to 76 residues, with an average size of 50 residues. The common core size is 80% of the average domain size, lending substantial support to the conclusion that these structures from differing superfamilies are indeed built upon a single common core. Furthermore, while five of the seven domains are from 4.10.320.10, the two domains from 1.10.8.60 do not contain a core that is substantially larger than the common core displayed in Figure8(d). The alignment of these two domains separately is shown in Figure9.


Multiple structure alignment with msTALI.

Shealy P, Valafar H - BMC Bioinformatics (2012)

An illustration of the protein cores derived from the CATH phylogenetic tree. Protein cores extracted from the tree in Figure6 by cutting the tree at various levels. The domains shown are: (a) 1w4e and 1w4i (b) those domains shown in (a) and 2 eq9 and 1w85 (c) those domains shown in (b) and 1w4h, 1zwv, and 1zy8 (d) those domains shown in (c) and 1nvm and 2fna. Groups (a) through (c) are from class 4.10.320.10, while the domains included in (d) are from domain 1.10.8.60.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473313&req=5

Figure 8: An illustration of the protein cores derived from the CATH phylogenetic tree. Protein cores extracted from the tree in Figure6 by cutting the tree at various levels. The domains shown are: (a) 1w4e and 1w4i (b) those domains shown in (a) and 2 eq9 and 1w85 (c) those domains shown in (b) and 1w4h, 1zwv, and 1zy8 (d) those domains shown in (c) and 1nvm and 2fna. Groups (a) through (c) are from class 4.10.320.10, while the domains included in (d) are from domain 1.10.8.60.
Mentions: The splitting of superfamily 1.10.8.60 occurs because the 4.10.320.10 domains have a strong core in common with those from 1.10.8.60, dividing 1.10.8.60 into two clusters. This splitting is illustrated in Figure5. The larger 1.10.8.60 cluster contains 94% of that superfamily’s domains, and so deviations from CATH relate to the smaller 1.10.8.60 cluster. We examined the branches containing the 4.10.320.10 cluster and the smaller 1.10.8.60 cluster in more detail as shown in Figure7. The protein cores created by cutting this portion of the phylogenetic tree at varying levels of similarity are shown in Figure8. As more domains are incorporated into the core, some regions exhibit structural diversity, while others are nearly identical between domains. The regions of diversity are almost exclusively located in turns. It is remarkable that domains from the two classes contain substantial overlap between their cores. The core sizes are 42, 40, 39, and 40 residues for the cores labelled (a), (b), (c), and (d). The nine structures used to generate these cores range in size from 40 to 76 residues, with an average size of 50 residues. The common core size is 80% of the average domain size, lending substantial support to the conclusion that these structures from differing superfamilies are indeed built upon a single common core. Furthermore, while five of the seven domains are from 4.10.320.10, the two domains from 1.10.8.60 do not contain a core that is substantially larger than the common core displayed in Figure8(d). The alignment of these two domains separately is shown in Figure9.

Bottom Line: Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification.We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications.In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.

ABSTRACT

Background: Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Cα atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems.

Results: msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion.

Conclusions: msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali.

Show MeSH
Related in: MedlinePlus