Limits...
Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH
Effect of varying RMSD on structural variation within a class.The plot shows the fragment content of equivalent BriX classes of length 7 created with fixed RMSD thresholds from 0.6 to 1 Angstrom. The increase in structural variation with higher RMSD thresholds is not uniformly distributed over all positions; there is a clear tendency towards the terminal positions (both carboxy- and amino-terminal), resulting in a fan-like arrangement.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g001: Effect of varying RMSD on structural variation within a class.The plot shows the fragment content of equivalent BriX classes of length 7 created with fixed RMSD thresholds from 0.6 to 1 Angstrom. The increase in structural variation with higher RMSD thresholds is not uniformly distributed over all positions; there is a clear tendency towards the terminal positions (both carboxy- and amino-terminal), resulting in a fan-like arrangement.

Mentions: By sliding a window of varying length (4–14 amino acids) over a non-redundant set of 1,261 high quality protein structures retrieved from the WHAT IF software package [39], about 260,000 protein fragments of each length were obtained. Using a multi-step clustering approach (see Materials and Methods section), these fragments were clustered into more than 1,000 up to approximately 2,000 structural classes, for each length ranging from 4 to 14 residues. Furthermore, we distinguished different degrees of variation inside the classes, by performing the clustering with 6 different distance thresholds. For instance, the considered RMSD thresholds for fragments consisting of 7 residues were 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 Angstrom. Clustering with varying RMSD thresholds was performed to provide degrees of structural diversity that are suited for a wide range of modeling requirements. Particularly for homology modeling, the threshold variation is a useful parameter for modeling structures with varying sequence identity. Lower thresholds yield more accurate fits for high identity regions, while larger thresholds enable the modeling of loops or other regions with lower sequence identity. Larger thresholds often result in fanlike shapes at the end of the fragments and at local loop positions. Figure 1 illustrates the structural variation within equivalent classes comprising 7 residues constructed at different. Increasing the distance threshold used to cluster the fragments resulted in a decrease in the number of classes being identified as regular structures. The number of classified fragments, i.e. fragments belonging to a fragment class, increased with larger thresholds applied in the clustering process (see Figure 2A). A larger threshold implies a wider radius around the class centroids, and thus larger and fewer classes with more internal variation. Monitoring the number of classes in function of the length of fragments classified at a fixed threshold (RMSD of 0.9 Angstrom, Figure 2B), we observed an increase in the number of classes with length until a fragment length of 11 residues, after which the number of classes dropped steeply (length 12–14). This turning point at fragment length 11 was not observed when plotting the percentage of fragments classified at RMSD threshold 0.9 versus the fragment length (Figure 2C). Here a smooth decline of the number of classified fragments for increasing fragment lengths was observed. When the same analysis was performed on clustering results for RMSD thresholds proportional to fragment length (0.1 Angstrom variation per residue), thus allowing more variation for larger fragment lengths, a similar pattern applied. Although less steep, we still observed a decrease of the percentage of classified fragments with increasing fragment length (Figure 2C). Again, a turning point (at fragment length 9) after which the number of classes dropped was observed (Figure 2B), albeit at a different fragment length then when a fixed threshold was applied. These results indicate that increasing the clustering threshold for longer fragments did not suffice to construct the same number of fragment classes as at shorter lengths. Increasing fragment length results in a larger conformational space and variability of classes, impairing the clustering of fragments into well-defined classes.


Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Effect of varying RMSD on structural variation within a class.The plot shows the fragment content of equivalent BriX classes of length 7 created with fixed RMSD thresholds from 0.6 to 1 Angstrom. The increase in structural variation with higher RMSD thresholds is not uniformly distributed over all positions; there is a clear tendency towards the terminal positions (both carboxy- and amino-terminal), resulting in a fan-like arrangement.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g001: Effect of varying RMSD on structural variation within a class.The plot shows the fragment content of equivalent BriX classes of length 7 created with fixed RMSD thresholds from 0.6 to 1 Angstrom. The increase in structural variation with higher RMSD thresholds is not uniformly distributed over all positions; there is a clear tendency towards the terminal positions (both carboxy- and amino-terminal), resulting in a fan-like arrangement.
Mentions: By sliding a window of varying length (4–14 amino acids) over a non-redundant set of 1,261 high quality protein structures retrieved from the WHAT IF software package [39], about 260,000 protein fragments of each length were obtained. Using a multi-step clustering approach (see Materials and Methods section), these fragments were clustered into more than 1,000 up to approximately 2,000 structural classes, for each length ranging from 4 to 14 residues. Furthermore, we distinguished different degrees of variation inside the classes, by performing the clustering with 6 different distance thresholds. For instance, the considered RMSD thresholds for fragments consisting of 7 residues were 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 Angstrom. Clustering with varying RMSD thresholds was performed to provide degrees of structural diversity that are suited for a wide range of modeling requirements. Particularly for homology modeling, the threshold variation is a useful parameter for modeling structures with varying sequence identity. Lower thresholds yield more accurate fits for high identity regions, while larger thresholds enable the modeling of loops or other regions with lower sequence identity. Larger thresholds often result in fanlike shapes at the end of the fragments and at local loop positions. Figure 1 illustrates the structural variation within equivalent classes comprising 7 residues constructed at different. Increasing the distance threshold used to cluster the fragments resulted in a decrease in the number of classes being identified as regular structures. The number of classified fragments, i.e. fragments belonging to a fragment class, increased with larger thresholds applied in the clustering process (see Figure 2A). A larger threshold implies a wider radius around the class centroids, and thus larger and fewer classes with more internal variation. Monitoring the number of classes in function of the length of fragments classified at a fixed threshold (RMSD of 0.9 Angstrom, Figure 2B), we observed an increase in the number of classes with length until a fragment length of 11 residues, after which the number of classes dropped steeply (length 12–14). This turning point at fragment length 11 was not observed when plotting the percentage of fragments classified at RMSD threshold 0.9 versus the fragment length (Figure 2C). Here a smooth decline of the number of classified fragments for increasing fragment lengths was observed. When the same analysis was performed on clustering results for RMSD thresholds proportional to fragment length (0.1 Angstrom variation per residue), thus allowing more variation for larger fragment lengths, a similar pattern applied. Although less steep, we still observed a decrease of the percentage of classified fragments with increasing fragment length (Figure 2C). Again, a turning point (at fragment length 9) after which the number of classes dropped was observed (Figure 2B), albeit at a different fragment length then when a fixed threshold was applied. These results indicate that increasing the clustering threshold for longer fragments did not suffice to construct the same number of fragment classes as at shorter lengths. Increasing fragment length results in a larger conformational space and variability of classes, impairing the clustering of fragments into well-defined classes.

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH