Limits...
Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH
Structural hierarchy of classes based on RMSD distance.The nodes are represented by means of a DSSP logo (generated using WebLogo [47]) and a denotation of the percentage of BriX classes (in black) and fragments (in red) it contains. At the second level, the hierarchical clustering is able to distinguish the two major secondary structure elements: strands and helices. These branches are further partitioned into loops and small turns. Notable is the content difference between the pure secondary structure nodes (k and p) at the bottom level of the tree. Although node k consists of 12.2% of all BriX classes, it only represents 19.8% of the fragments of the WHAT IF set. Node p, on the contrary, embodies 27.8% of the fragment space, while holding only 3.4% of the BriX classes. This discrepancy shows that the stronger structural constraints imposed on helices result in fewer and larger helical classes than the strand classes created with the same threshold.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g003: Structural hierarchy of classes based on RMSD distance.The nodes are represented by means of a DSSP logo (generated using WebLogo [47]) and a denotation of the percentage of BriX classes (in black) and fragments (in red) it contains. At the second level, the hierarchical clustering is able to distinguish the two major secondary structure elements: strands and helices. These branches are further partitioned into loops and small turns. Notable is the content difference between the pure secondary structure nodes (k and p) at the bottom level of the tree. Although node k consists of 12.2% of all BriX classes, it only represents 19.8% of the fragments of the WHAT IF set. Node p, on the contrary, embodies 27.8% of the fragment space, while holding only 3.4% of the BriX classes. This discrepancy shows that the stronger structural constraints imposed on helices result in fewer and larger helical classes than the strand classes created with the same threshold.

Mentions: The dendogram in Figure 3 is the result of applying a clustering approach similar to the one used to construct the BriX database on the class centroids comprising 7 residues. In this way, the fragment space is rebuilt by grouping the BriX classes into superclasses based on root mean square distances between the class centroids. For reasons of simplicity only the largest classes (i.e. superclasses that contain more than 1% of BriX classes) are shown on each level. At the top of the classification, one branch is shown that comprises 98.6% of all classes and 87.9% of all fragments allowing a maximal distance of 1.8 Angstrom RMSD between the class centroids. At the second level the clustering method is capable of separating the two principal secondary structure elements: strands and helices. These segregrate further into smaller, more specific conformations. A counterintuitive result is that the clustering method does not differentiate between turn and helix secondary structure elements on the top level. Instead we find them at different levels in the tree (see Superclasses i, m, n and q). Figure 4A shows the percentage of fragments that were found to be recurrent regarding an increasing distance threshold. Clearly shown is the difficulty to classify sheets and loops, despite an increasing distance threshold. This is because they do not exist in well-defined conformations, but instead occupy a wide range of geometries. As a consequence, only a few recurring structures could be identified, resulting in a low number of classes and a large number of unclassified sheet- and loop fragments. This was also observed by Du et al [40] when determining the probability of finding a short protein fragment back among non-homologous structures in the Protein Data Bank. According to Figure 4A, the authors perceived a nearest neighbor RMSD distribution for β fragments close to that for loop fragments.


Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Structural hierarchy of classes based on RMSD distance.The nodes are represented by means of a DSSP logo (generated using WebLogo [47]) and a denotation of the percentage of BriX classes (in black) and fragments (in red) it contains. At the second level, the hierarchical clustering is able to distinguish the two major secondary structure elements: strands and helices. These branches are further partitioned into loops and small turns. Notable is the content difference between the pure secondary structure nodes (k and p) at the bottom level of the tree. Although node k consists of 12.2% of all BriX classes, it only represents 19.8% of the fragments of the WHAT IF set. Node p, on the contrary, embodies 27.8% of the fragment space, while holding only 3.4% of the BriX classes. This discrepancy shows that the stronger structural constraints imposed on helices result in fewer and larger helical classes than the strand classes created with the same threshold.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g003: Structural hierarchy of classes based on RMSD distance.The nodes are represented by means of a DSSP logo (generated using WebLogo [47]) and a denotation of the percentage of BriX classes (in black) and fragments (in red) it contains. At the second level, the hierarchical clustering is able to distinguish the two major secondary structure elements: strands and helices. These branches are further partitioned into loops and small turns. Notable is the content difference between the pure secondary structure nodes (k and p) at the bottom level of the tree. Although node k consists of 12.2% of all BriX classes, it only represents 19.8% of the fragments of the WHAT IF set. Node p, on the contrary, embodies 27.8% of the fragment space, while holding only 3.4% of the BriX classes. This discrepancy shows that the stronger structural constraints imposed on helices result in fewer and larger helical classes than the strand classes created with the same threshold.
Mentions: The dendogram in Figure 3 is the result of applying a clustering approach similar to the one used to construct the BriX database on the class centroids comprising 7 residues. In this way, the fragment space is rebuilt by grouping the BriX classes into superclasses based on root mean square distances between the class centroids. For reasons of simplicity only the largest classes (i.e. superclasses that contain more than 1% of BriX classes) are shown on each level. At the top of the classification, one branch is shown that comprises 98.6% of all classes and 87.9% of all fragments allowing a maximal distance of 1.8 Angstrom RMSD between the class centroids. At the second level the clustering method is capable of separating the two principal secondary structure elements: strands and helices. These segregrate further into smaller, more specific conformations. A counterintuitive result is that the clustering method does not differentiate between turn and helix secondary structure elements on the top level. Instead we find them at different levels in the tree (see Superclasses i, m, n and q). Figure 4A shows the percentage of fragments that were found to be recurrent regarding an increasing distance threshold. Clearly shown is the difficulty to classify sheets and loops, despite an increasing distance threshold. This is because they do not exist in well-defined conformations, but instead occupy a wide range of geometries. As a consequence, only a few recurring structures could be identified, resulting in a low number of classes and a large number of unclassified sheet- and loop fragments. This was also observed by Du et al [40] when determining the probability of finding a short protein fragment back among non-homologous structures in the Protein Data Bank. According to Figure 4A, the authors perceived a nearest neighbor RMSD distribution for β fragments close to that for loop fragments.

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH