Limits...
Multiple structure alignment with msTALI.

Shealy P, Valafar H - BMC Bioinformatics (2012)

Bottom Line: Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification.We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications.In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.

ABSTRACT

Background: Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Cα atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems.

Results: msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion.

Conclusions: msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali.

Show MeSH

Related in: MedlinePlus

A phylogenetic tree illustrating the division of the 1.10.8.60 CATH domains. A phylogenetic tree for a portion of the CATH domains clustered with msTALI. This tree illustrates placement of the 4.10.320.10 cluster among the 1.10.8.60 clusters. The tree was rendered with TreeGraph 2 [35] and T-Rex [36].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473313&req=5

Figure 5: A phylogenetic tree illustrating the division of the 1.10.8.60 CATH domains. A phylogenetic tree for a portion of the CATH domains clustered with msTALI. This tree illustrates placement of the 4.10.320.10 cluster among the 1.10.8.60 clusters. The tree was rendered with TreeGraph 2 [35] and T-Rex [36].

Mentions: Our approach successfully placed each domain in the tree next to other domains from the same homologous superfamily. For the purposes of analysis, we define a cluster to be a subtree from our phylogeny tree that only contains domains from a single superfamily. We divided the tree into its maximal sized clusters. We expect that for most superfamilies, all of the superfamily’s domains will be contained in a single cluster. This is indeed the case; of the 16 superfamilies selected from CATH, 13 had all domains placed into a single cluster. Two superfamilies (1.10.8.60 and 1.10.150.20) were placed into two clusters; these divisions are illustrated in Figures 5 and 6. One superfamily (3.10.20.90) was placed into three clusters. The average cluster size was 17 domains, compared to the average CATH superfamily size of 21 domains. Domains in the three divided superfamilies were not evenly distributed among the multiple clusters. The largest cluster for 1.10.8.60 contained 94% of all domains from that superfamily, while the largest cluster for 1.10.150.20 contained 84% of all domains from that superfamily and 3.10.20.90 (not shown) contained 90% of the domains from that superfamily. While there are few differences with CATH at the homologous superfamily level, these differences warranted further investigation. There are two situations under which a superfamily might be divided into multiple clusters. The first is when all domains in a superfamily do not share a common core. The second is when domains from one superfamily have a core in common with another superfamily, and domains from the second superfamily divide the first superfamily into multiple clusters. We present an example of each situation from our results.


Multiple structure alignment with msTALI.

Shealy P, Valafar H - BMC Bioinformatics (2012)

A phylogenetic tree illustrating the division of the 1.10.8.60 CATH domains. A phylogenetic tree for a portion of the CATH domains clustered with msTALI. This tree illustrates placement of the 4.10.320.10 cluster among the 1.10.8.60 clusters. The tree was rendered with TreeGraph 2 [35] and T-Rex [36].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473313&req=5

Figure 5: A phylogenetic tree illustrating the division of the 1.10.8.60 CATH domains. A phylogenetic tree for a portion of the CATH domains clustered with msTALI. This tree illustrates placement of the 4.10.320.10 cluster among the 1.10.8.60 clusters. The tree was rendered with TreeGraph 2 [35] and T-Rex [36].
Mentions: Our approach successfully placed each domain in the tree next to other domains from the same homologous superfamily. For the purposes of analysis, we define a cluster to be a subtree from our phylogeny tree that only contains domains from a single superfamily. We divided the tree into its maximal sized clusters. We expect that for most superfamilies, all of the superfamily’s domains will be contained in a single cluster. This is indeed the case; of the 16 superfamilies selected from CATH, 13 had all domains placed into a single cluster. Two superfamilies (1.10.8.60 and 1.10.150.20) were placed into two clusters; these divisions are illustrated in Figures 5 and 6. One superfamily (3.10.20.90) was placed into three clusters. The average cluster size was 17 domains, compared to the average CATH superfamily size of 21 domains. Domains in the three divided superfamilies were not evenly distributed among the multiple clusters. The largest cluster for 1.10.8.60 contained 94% of all domains from that superfamily, while the largest cluster for 1.10.150.20 contained 84% of all domains from that superfamily and 3.10.20.90 (not shown) contained 90% of the domains from that superfamily. While there are few differences with CATH at the homologous superfamily level, these differences warranted further investigation. There are two situations under which a superfamily might be divided into multiple clusters. The first is when all domains in a superfamily do not share a common core. The second is when domains from one superfamily have a core in common with another superfamily, and domains from the second superfamily divide the first superfamily into multiple clusters. We present an example of each situation from our results.

Bottom Line: Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification.We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications.In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.

ABSTRACT

Background: Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Cα atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems.

Results: msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion.

Conclusions: msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali.

Show MeSH
Related in: MedlinePlus