Limits...
Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH
BriX statistics with regard to secondary structure content.(A, B) Effect secondary structure on the respective classification. The plots show data for classes consisting of one secondary structure element, i.e., pure helical (red), strand (blue), turn (green), and loop (orange) classes. The data selection was based on the fragments or fragment classes having an overall DSSP content of more than 80% in these 4 structural elements. Shown is the percentage of classified fragments regarding an increasing distance threshold. Although the vast majority of helical fragments were found to be recurrent (A), the number of respective structural classes is low compared to the number of strand classes (B). Because of the stabilizing hydrogen bonds, helices do not allow a lot of variation, resulting in few large BriX classes. The variable character and infrequent occurences of loops and turns are the main reason for the small number of recurrent structures and poor classification results. (C) Classification results for the Astral40 validation test. The BriX fragment classification obtained from the WHAT IF globular structure set was used to classify fragments generated from the Astral40 structures. Experiments evaluating the effect of increasing threshold on the percentage of classified fragments were repeated for the full Astral40 set (open circles) and for the Astral40 structures of the major SCOP classes (all α [diamonds], all β [triangles], α/β [closed circles], and α+β [squares]). The initial classification results for the WHAT IF generated fragments (open squares) are shown for reference. The full Astral set follows a similar classification pattern as the WHAT IF set, showing that the latter gives a good representation of protein structures in general. The higher classification rate of helical proteins points to a lower structural variation within these structures.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g004: BriX statistics with regard to secondary structure content.(A, B) Effect secondary structure on the respective classification. The plots show data for classes consisting of one secondary structure element, i.e., pure helical (red), strand (blue), turn (green), and loop (orange) classes. The data selection was based on the fragments or fragment classes having an overall DSSP content of more than 80% in these 4 structural elements. Shown is the percentage of classified fragments regarding an increasing distance threshold. Although the vast majority of helical fragments were found to be recurrent (A), the number of respective structural classes is low compared to the number of strand classes (B). Because of the stabilizing hydrogen bonds, helices do not allow a lot of variation, resulting in few large BriX classes. The variable character and infrequent occurences of loops and turns are the main reason for the small number of recurrent structures and poor classification results. (C) Classification results for the Astral40 validation test. The BriX fragment classification obtained from the WHAT IF globular structure set was used to classify fragments generated from the Astral40 structures. Experiments evaluating the effect of increasing threshold on the percentage of classified fragments were repeated for the full Astral40 set (open circles) and for the Astral40 structures of the major SCOP classes (all α [diamonds], all β [triangles], α/β [closed circles], and α+β [squares]). The initial classification results for the WHAT IF generated fragments (open squares) are shown for reference. The full Astral set follows a similar classification pattern as the WHAT IF set, showing that the latter gives a good representation of protein structures in general. The higher classification rate of helical proteins points to a lower structural variation within these structures.

Mentions: The dendogram in Figure 3 is the result of applying a clustering approach similar to the one used to construct the BriX database on the class centroids comprising 7 residues. In this way, the fragment space is rebuilt by grouping the BriX classes into superclasses based on root mean square distances between the class centroids. For reasons of simplicity only the largest classes (i.e. superclasses that contain more than 1% of BriX classes) are shown on each level. At the top of the classification, one branch is shown that comprises 98.6% of all classes and 87.9% of all fragments allowing a maximal distance of 1.8 Angstrom RMSD between the class centroids. At the second level the clustering method is capable of separating the two principal secondary structure elements: strands and helices. These segregrate further into smaller, more specific conformations. A counterintuitive result is that the clustering method does not differentiate between turn and helix secondary structure elements on the top level. Instead we find them at different levels in the tree (see Superclasses i, m, n and q). Figure 4A shows the percentage of fragments that were found to be recurrent regarding an increasing distance threshold. Clearly shown is the difficulty to classify sheets and loops, despite an increasing distance threshold. This is because they do not exist in well-defined conformations, but instead occupy a wide range of geometries. As a consequence, only a few recurring structures could be identified, resulting in a low number of classes and a large number of unclassified sheet- and loop fragments. This was also observed by Du et al [40] when determining the probability of finding a short protein fragment back among non-homologous structures in the Protein Data Bank. According to Figure 4A, the authors perceived a nearest neighbor RMSD distribution for β fragments close to that for loop fragments.


Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

BriX statistics with regard to secondary structure content.(A, B) Effect secondary structure on the respective classification. The plots show data for classes consisting of one secondary structure element, i.e., pure helical (red), strand (blue), turn (green), and loop (orange) classes. The data selection was based on the fragments or fragment classes having an overall DSSP content of more than 80% in these 4 structural elements. Shown is the percentage of classified fragments regarding an increasing distance threshold. Although the vast majority of helical fragments were found to be recurrent (A), the number of respective structural classes is low compared to the number of strand classes (B). Because of the stabilizing hydrogen bonds, helices do not allow a lot of variation, resulting in few large BriX classes. The variable character and infrequent occurences of loops and turns are the main reason for the small number of recurrent structures and poor classification results. (C) Classification results for the Astral40 validation test. The BriX fragment classification obtained from the WHAT IF globular structure set was used to classify fragments generated from the Astral40 structures. Experiments evaluating the effect of increasing threshold on the percentage of classified fragments were repeated for the full Astral40 set (open circles) and for the Astral40 structures of the major SCOP classes (all α [diamonds], all β [triangles], α/β [closed circles], and α+β [squares]). The initial classification results for the WHAT IF generated fragments (open squares) are shown for reference. The full Astral set follows a similar classification pattern as the WHAT IF set, showing that the latter gives a good representation of protein structures in general. The higher classification rate of helical proteins points to a lower structural variation within these structures.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g004: BriX statistics with regard to secondary structure content.(A, B) Effect secondary structure on the respective classification. The plots show data for classes consisting of one secondary structure element, i.e., pure helical (red), strand (blue), turn (green), and loop (orange) classes. The data selection was based on the fragments or fragment classes having an overall DSSP content of more than 80% in these 4 structural elements. Shown is the percentage of classified fragments regarding an increasing distance threshold. Although the vast majority of helical fragments were found to be recurrent (A), the number of respective structural classes is low compared to the number of strand classes (B). Because of the stabilizing hydrogen bonds, helices do not allow a lot of variation, resulting in few large BriX classes. The variable character and infrequent occurences of loops and turns are the main reason for the small number of recurrent structures and poor classification results. (C) Classification results for the Astral40 validation test. The BriX fragment classification obtained from the WHAT IF globular structure set was used to classify fragments generated from the Astral40 structures. Experiments evaluating the effect of increasing threshold on the percentage of classified fragments were repeated for the full Astral40 set (open circles) and for the Astral40 structures of the major SCOP classes (all α [diamonds], all β [triangles], α/β [closed circles], and α+β [squares]). The initial classification results for the WHAT IF generated fragments (open squares) are shown for reference. The full Astral set follows a similar classification pattern as the WHAT IF set, showing that the latter gives a good representation of protein structures in general. The higher classification rate of helical proteins points to a lower structural variation within these structures.
Mentions: The dendogram in Figure 3 is the result of applying a clustering approach similar to the one used to construct the BriX database on the class centroids comprising 7 residues. In this way, the fragment space is rebuilt by grouping the BriX classes into superclasses based on root mean square distances between the class centroids. For reasons of simplicity only the largest classes (i.e. superclasses that contain more than 1% of BriX classes) are shown on each level. At the top of the classification, one branch is shown that comprises 98.6% of all classes and 87.9% of all fragments allowing a maximal distance of 1.8 Angstrom RMSD between the class centroids. At the second level the clustering method is capable of separating the two principal secondary structure elements: strands and helices. These segregrate further into smaller, more specific conformations. A counterintuitive result is that the clustering method does not differentiate between turn and helix secondary structure elements on the top level. Instead we find them at different levels in the tree (see Superclasses i, m, n and q). Figure 4A shows the percentage of fragments that were found to be recurrent regarding an increasing distance threshold. Clearly shown is the difficulty to classify sheets and loops, despite an increasing distance threshold. This is because they do not exist in well-defined conformations, but instead occupy a wide range of geometries. As a consequence, only a few recurring structures could be identified, resulting in a low number of classes and a large number of unclassified sheet- and loop fragments. This was also observed by Du et al [40] when determining the probability of finding a short protein fragment back among non-homologous structures in the Protein Data Bank. According to Figure 4A, the authors perceived a nearest neighbor RMSD distribution for β fragments close to that for loop fragments.

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH