Limits...
Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH
BriX clustering statistics.(A) Effect of increasing RMSD threshold. Shown is the number of BriX classes (circles) and the percentage of classified fragments (squares) in function of the RMSD threshold (0.5–1.0 Angstrom) used during the clustering for fragments containing 7 residues. As expected, higher thresholds result in fewer fragment classes and more identified recurrent fragment structures as the variation within a class is higher and a class thus contains more elements. A threshold of 0.6 Angstrom is sufficient to classify more than half of all fragments of length 7. (B) Number of classes for varying fragment lengths. Shown is the number of classes in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both figures, the number of classes increases with the length until a turning point is reached, after which the number of classes drops steeply. When a fixed RMSD is applied, this turning point clearly occurs at fragment length 11, reaching the level of 2,740 classes. (C) Percentage of classified fragments for varying fragment lengths. Shown is the percentage of classified fragments in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both plots, the number of classified fragments smoothly decreases when larger fragment lengths are considered. When a proportional RMSD is applied, this decrease is less steep, resulting in a classification percentage of more than 40% compared with 26% (fixed RMSD) at fragment length 14.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g002: BriX clustering statistics.(A) Effect of increasing RMSD threshold. Shown is the number of BriX classes (circles) and the percentage of classified fragments (squares) in function of the RMSD threshold (0.5–1.0 Angstrom) used during the clustering for fragments containing 7 residues. As expected, higher thresholds result in fewer fragment classes and more identified recurrent fragment structures as the variation within a class is higher and a class thus contains more elements. A threshold of 0.6 Angstrom is sufficient to classify more than half of all fragments of length 7. (B) Number of classes for varying fragment lengths. Shown is the number of classes in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both figures, the number of classes increases with the length until a turning point is reached, after which the number of classes drops steeply. When a fixed RMSD is applied, this turning point clearly occurs at fragment length 11, reaching the level of 2,740 classes. (C) Percentage of classified fragments for varying fragment lengths. Shown is the percentage of classified fragments in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both plots, the number of classified fragments smoothly decreases when larger fragment lengths are considered. When a proportional RMSD is applied, this decrease is less steep, resulting in a classification percentage of more than 40% compared with 26% (fixed RMSD) at fragment length 14.

Mentions: By sliding a window of varying length (4–14 amino acids) over a non-redundant set of 1,261 high quality protein structures retrieved from the WHAT IF software package [39], about 260,000 protein fragments of each length were obtained. Using a multi-step clustering approach (see Materials and Methods section), these fragments were clustered into more than 1,000 up to approximately 2,000 structural classes, for each length ranging from 4 to 14 residues. Furthermore, we distinguished different degrees of variation inside the classes, by performing the clustering with 6 different distance thresholds. For instance, the considered RMSD thresholds for fragments consisting of 7 residues were 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 Angstrom. Clustering with varying RMSD thresholds was performed to provide degrees of structural diversity that are suited for a wide range of modeling requirements. Particularly for homology modeling, the threshold variation is a useful parameter for modeling structures with varying sequence identity. Lower thresholds yield more accurate fits for high identity regions, while larger thresholds enable the modeling of loops or other regions with lower sequence identity. Larger thresholds often result in fanlike shapes at the end of the fragments and at local loop positions. Figure 1 illustrates the structural variation within equivalent classes comprising 7 residues constructed at different. Increasing the distance threshold used to cluster the fragments resulted in a decrease in the number of classes being identified as regular structures. The number of classified fragments, i.e. fragments belonging to a fragment class, increased with larger thresholds applied in the clustering process (see Figure 2A). A larger threshold implies a wider radius around the class centroids, and thus larger and fewer classes with more internal variation. Monitoring the number of classes in function of the length of fragments classified at a fixed threshold (RMSD of 0.9 Angstrom, Figure 2B), we observed an increase in the number of classes with length until a fragment length of 11 residues, after which the number of classes dropped steeply (length 12–14). This turning point at fragment length 11 was not observed when plotting the percentage of fragments classified at RMSD threshold 0.9 versus the fragment length (Figure 2C). Here a smooth decline of the number of classified fragments for increasing fragment lengths was observed. When the same analysis was performed on clustering results for RMSD thresholds proportional to fragment length (0.1 Angstrom variation per residue), thus allowing more variation for larger fragment lengths, a similar pattern applied. Although less steep, we still observed a decrease of the percentage of classified fragments with increasing fragment length (Figure 2C). Again, a turning point (at fragment length 9) after which the number of classes dropped was observed (Figure 2B), albeit at a different fragment length then when a fixed threshold was applied. These results indicate that increasing the clustering threshold for longer fragments did not suffice to construct the same number of fragment classes as at shorter lengths. Increasing fragment length results in a larger conformational space and variability of classes, impairing the clustering of fragments into well-defined classes.


Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

BriX clustering statistics.(A) Effect of increasing RMSD threshold. Shown is the number of BriX classes (circles) and the percentage of classified fragments (squares) in function of the RMSD threshold (0.5–1.0 Angstrom) used during the clustering for fragments containing 7 residues. As expected, higher thresholds result in fewer fragment classes and more identified recurrent fragment structures as the variation within a class is higher and a class thus contains more elements. A threshold of 0.6 Angstrom is sufficient to classify more than half of all fragments of length 7. (B) Number of classes for varying fragment lengths. Shown is the number of classes in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both figures, the number of classes increases with the length until a turning point is reached, after which the number of classes drops steeply. When a fixed RMSD is applied, this turning point clearly occurs at fragment length 11, reaching the level of 2,740 classes. (C) Percentage of classified fragments for varying fragment lengths. Shown is the percentage of classified fragments in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both plots, the number of classified fragments smoothly decreases when larger fragment lengths are considered. When a proportional RMSD is applied, this decrease is less steep, resulting in a classification percentage of more than 40% compared with 26% (fixed RMSD) at fragment length 14.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g002: BriX clustering statistics.(A) Effect of increasing RMSD threshold. Shown is the number of BriX classes (circles) and the percentage of classified fragments (squares) in function of the RMSD threshold (0.5–1.0 Angstrom) used during the clustering for fragments containing 7 residues. As expected, higher thresholds result in fewer fragment classes and more identified recurrent fragment structures as the variation within a class is higher and a class thus contains more elements. A threshold of 0.6 Angstrom is sufficient to classify more than half of all fragments of length 7. (B) Number of classes for varying fragment lengths. Shown is the number of classes in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both figures, the number of classes increases with the length until a turning point is reached, after which the number of classes drops steeply. When a fixed RMSD is applied, this turning point clearly occurs at fragment length 11, reaching the level of 2,740 classes. (C) Percentage of classified fragments for varying fragment lengths. Shown is the percentage of classified fragments in function of the fragment length clustered with a fixed RMSD threshold (circles) of 0.9 Angstrom and a RMSD proportional to the fragment length (squares), by increasing the RMSD with 0.1 Angstrom per residue. In both plots, the number of classified fragments smoothly decreases when larger fragment lengths are considered. When a proportional RMSD is applied, this decrease is less steep, resulting in a classification percentage of more than 40% compared with 26% (fixed RMSD) at fragment length 14.
Mentions: By sliding a window of varying length (4–14 amino acids) over a non-redundant set of 1,261 high quality protein structures retrieved from the WHAT IF software package [39], about 260,000 protein fragments of each length were obtained. Using a multi-step clustering approach (see Materials and Methods section), these fragments were clustered into more than 1,000 up to approximately 2,000 structural classes, for each length ranging from 4 to 14 residues. Furthermore, we distinguished different degrees of variation inside the classes, by performing the clustering with 6 different distance thresholds. For instance, the considered RMSD thresholds for fragments consisting of 7 residues were 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 Angstrom. Clustering with varying RMSD thresholds was performed to provide degrees of structural diversity that are suited for a wide range of modeling requirements. Particularly for homology modeling, the threshold variation is a useful parameter for modeling structures with varying sequence identity. Lower thresholds yield more accurate fits for high identity regions, while larger thresholds enable the modeling of loops or other regions with lower sequence identity. Larger thresholds often result in fanlike shapes at the end of the fragments and at local loop positions. Figure 1 illustrates the structural variation within equivalent classes comprising 7 residues constructed at different. Increasing the distance threshold used to cluster the fragments resulted in a decrease in the number of classes being identified as regular structures. The number of classified fragments, i.e. fragments belonging to a fragment class, increased with larger thresholds applied in the clustering process (see Figure 2A). A larger threshold implies a wider radius around the class centroids, and thus larger and fewer classes with more internal variation. Monitoring the number of classes in function of the length of fragments classified at a fixed threshold (RMSD of 0.9 Angstrom, Figure 2B), we observed an increase in the number of classes with length until a fragment length of 11 residues, after which the number of classes dropped steeply (length 12–14). This turning point at fragment length 11 was not observed when plotting the percentage of fragments classified at RMSD threshold 0.9 versus the fragment length (Figure 2C). Here a smooth decline of the number of classified fragments for increasing fragment lengths was observed. When the same analysis was performed on clustering results for RMSD thresholds proportional to fragment length (0.1 Angstrom variation per residue), thus allowing more variation for larger fragment lengths, a similar pattern applied. Although less steep, we still observed a decrease of the percentage of classified fragments with increasing fragment length (Figure 2C). Again, a turning point (at fragment length 9) after which the number of classes dropped was observed (Figure 2B), albeit at a different fragment length then when a fixed threshold was applied. These results indicate that increasing the clustering threshold for longer fragments did not suffice to construct the same number of fragment classes as at shorter lengths. Increasing fragment length results in a larger conformational space and variability of classes, impairing the clustering of fragments into well-defined classes.

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH