Limits...
Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH
Presence of structural switches within groups of fragments containing identical residue sequences.(A) The effect of the fragment length on the structural variation. Shown is the percentage of identical sequence pairs in function of the structural distance between them for fragments of length 5 (red), 9 (blue), and 13 (green) in the Astral40 dataset. Clearly shown in the main histogram is the tendency of smaller fragments to manifest large structural variation. The smaller plot is the result of carrying out the Hierarchical Agglomeration process on nearly 40,000 sequences where this variation was recognized. The clustering considered two different distance thresholds: 1.5 Angstrom (red) and 2.0 Angstrom (blue) RMSD. The plot shows that for the vast amount of these sequences, 2 structural groups can be identified. (B) Example of structure differences for one amino acid sequence. The sequence AAVGL can adopt both a strand (left) and helix (right) conformation. The strand conformation is present in the Antigen 85-C protein (structure 1DQZ) and starts at residue-number 119. The helical conformation is cut from the 2,2-dialkylglycine decarboxylase protein (1D7U) at residue-number 311. (C) Amino acid usage in plastic sequences. Shown is the frequency of amino acids occurring in sequences that only allow small structural jumps, resulting in tiny variations of a certain conformation (in red) and in sequences where these jumps are larger, resulting in drastic structure switches (in blue). The green bars indicate the presence of the respective amino acids in fragments that were left unclassified in BriX classes, due to their irregular character. Three groups can be distinguished: amino acids promoting (1) a single-well defined regular structure (such as Tryptophan, Tyrosine, Phenylalanine, Cysteine, Asparagine, and Methionine), (2) several regular structures or structural jumps (such as Alanine, Leucine, and Valine), and (3) irregular structures (such as Glycine and Proline).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g006: Presence of structural switches within groups of fragments containing identical residue sequences.(A) The effect of the fragment length on the structural variation. Shown is the percentage of identical sequence pairs in function of the structural distance between them for fragments of length 5 (red), 9 (blue), and 13 (green) in the Astral40 dataset. Clearly shown in the main histogram is the tendency of smaller fragments to manifest large structural variation. The smaller plot is the result of carrying out the Hierarchical Agglomeration process on nearly 40,000 sequences where this variation was recognized. The clustering considered two different distance thresholds: 1.5 Angstrom (red) and 2.0 Angstrom (blue) RMSD. The plot shows that for the vast amount of these sequences, 2 structural groups can be identified. (B) Example of structure differences for one amino acid sequence. The sequence AAVGL can adopt both a strand (left) and helix (right) conformation. The strand conformation is present in the Antigen 85-C protein (structure 1DQZ) and starts at residue-number 119. The helical conformation is cut from the 2,2-dialkylglycine decarboxylase protein (1D7U) at residue-number 311. (C) Amino acid usage in plastic sequences. Shown is the frequency of amino acids occurring in sequences that only allow small structural jumps, resulting in tiny variations of a certain conformation (in red) and in sequences where these jumps are larger, resulting in drastic structure switches (in blue). The green bars indicate the presence of the respective amino acids in fragments that were left unclassified in BriX classes, due to their irregular character. Three groups can be distinguished: amino acids promoting (1) a single-well defined regular structure (such as Tryptophan, Tyrosine, Phenylalanine, Cysteine, Asparagine, and Methionine), (2) several regular structures or structural jumps (such as Alanine, Leucine, and Valine), and (3) irregular structures (such as Glycine and Proline).

Mentions: When considering part of a protein structure, the possibility was examined to predict the corresponding BriX class from sequence information. However, exceptions aside, the overall sequence conservation within the classes was rather low, precluding sequence to structure prediction. This is to be expected due to the large number of classes resulting from the high-resolution clustering. In addition, an analysis was carried out to identify the magnitude of structural variance in conformations that a single sequence can adopt. The experiment, from which the results are shown in Figure 6, consisted of calculating the pairwise RMSD between fragments with an identical sequence (see Materials and Methods section). Figure 6A shows the normalized distribution of the obtained RMSD values, for three different fragment lengths. Two peaks were observed: the first peak at 0.2 Angstrom revealed that the majority of the fragments, containing an identical amino acid sequence, adopt a similar conformation. A smaller yet significant second peak was recognized at an RMSD of 1.6–2 Angstrom. The idea arose that, certainly for smaller fragment lengths (smaller than 7 residues), a drastic structural switch can occur. This idea was verified through a thorough cluster analysis on nearly 40,000 sequences where this second peak was recognized. For each sequence, the analysis consisted in carrying out the hierarchical agglomeration process on the fragments sharing this sequence. The resulting histogram, shown in Figure 6A, confirms that for the vast amount of these sequences two structural groups could be observed. As the sequence length increases, the second peak in Figure 6A gradually disappears, indicating that additional structural context information is required to remove structural ambiguities.


Reconstruction of protein backbones from the BriX collection of canonical protein fragments.

Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, Serrano L, Rousseau F, Schymkowitz J - PLoS Comput. Biol. (2008)

Presence of structural switches within groups of fragments containing identical residue sequences.(A) The effect of the fragment length on the structural variation. Shown is the percentage of identical sequence pairs in function of the structural distance between them for fragments of length 5 (red), 9 (blue), and 13 (green) in the Astral40 dataset. Clearly shown in the main histogram is the tendency of smaller fragments to manifest large structural variation. The smaller plot is the result of carrying out the Hierarchical Agglomeration process on nearly 40,000 sequences where this variation was recognized. The clustering considered two different distance thresholds: 1.5 Angstrom (red) and 2.0 Angstrom (blue) RMSD. The plot shows that for the vast amount of these sequences, 2 structural groups can be identified. (B) Example of structure differences for one amino acid sequence. The sequence AAVGL can adopt both a strand (left) and helix (right) conformation. The strand conformation is present in the Antigen 85-C protein (structure 1DQZ) and starts at residue-number 119. The helical conformation is cut from the 2,2-dialkylglycine decarboxylase protein (1D7U) at residue-number 311. (C) Amino acid usage in plastic sequences. Shown is the frequency of amino acids occurring in sequences that only allow small structural jumps, resulting in tiny variations of a certain conformation (in red) and in sequences where these jumps are larger, resulting in drastic structure switches (in blue). The green bars indicate the presence of the respective amino acids in fragments that were left unclassified in BriX classes, due to their irregular character. Three groups can be distinguished: amino acids promoting (1) a single-well defined regular structure (such as Tryptophan, Tyrosine, Phenylalanine, Cysteine, Asparagine, and Methionine), (2) several regular structures or structural jumps (such as Alanine, Leucine, and Valine), and (3) irregular structures (such as Glycine and Proline).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2367438&req=5

pcbi-1000083-g006: Presence of structural switches within groups of fragments containing identical residue sequences.(A) The effect of the fragment length on the structural variation. Shown is the percentage of identical sequence pairs in function of the structural distance between them for fragments of length 5 (red), 9 (blue), and 13 (green) in the Astral40 dataset. Clearly shown in the main histogram is the tendency of smaller fragments to manifest large structural variation. The smaller plot is the result of carrying out the Hierarchical Agglomeration process on nearly 40,000 sequences where this variation was recognized. The clustering considered two different distance thresholds: 1.5 Angstrom (red) and 2.0 Angstrom (blue) RMSD. The plot shows that for the vast amount of these sequences, 2 structural groups can be identified. (B) Example of structure differences for one amino acid sequence. The sequence AAVGL can adopt both a strand (left) and helix (right) conformation. The strand conformation is present in the Antigen 85-C protein (structure 1DQZ) and starts at residue-number 119. The helical conformation is cut from the 2,2-dialkylglycine decarboxylase protein (1D7U) at residue-number 311. (C) Amino acid usage in plastic sequences. Shown is the frequency of amino acids occurring in sequences that only allow small structural jumps, resulting in tiny variations of a certain conformation (in red) and in sequences where these jumps are larger, resulting in drastic structure switches (in blue). The green bars indicate the presence of the respective amino acids in fragments that were left unclassified in BriX classes, due to their irregular character. Three groups can be distinguished: amino acids promoting (1) a single-well defined regular structure (such as Tryptophan, Tyrosine, Phenylalanine, Cysteine, Asparagine, and Methionine), (2) several regular structures or structural jumps (such as Alanine, Leucine, and Valine), and (3) irregular structures (such as Glycine and Proline).
Mentions: When considering part of a protein structure, the possibility was examined to predict the corresponding BriX class from sequence information. However, exceptions aside, the overall sequence conservation within the classes was rather low, precluding sequence to structure prediction. This is to be expected due to the large number of classes resulting from the high-resolution clustering. In addition, an analysis was carried out to identify the magnitude of structural variance in conformations that a single sequence can adopt. The experiment, from which the results are shown in Figure 6, consisted of calculating the pairwise RMSD between fragments with an identical sequence (see Materials and Methods section). Figure 6A shows the normalized distribution of the obtained RMSD values, for three different fragment lengths. Two peaks were observed: the first peak at 0.2 Angstrom revealed that the majority of the fragments, containing an identical amino acid sequence, adopt a similar conformation. A smaller yet significant second peak was recognized at an RMSD of 1.6–2 Angstrom. The idea arose that, certainly for smaller fragment lengths (smaller than 7 residues), a drastic structural switch can occur. This idea was verified through a thorough cluster analysis on nearly 40,000 sequences where this second peak was recognized. For each sequence, the analysis consisted in carrying out the hierarchical agglomeration process on the fragments sharing this sequence. The resulting histogram, shown in Figure 6A, confirms that for the vast amount of these sequences two structural groups could be observed. As the sequence length increases, the second peak in Figure 6A gradually disappears, indicating that additional structural context information is required to remove structural ambiguities.

Bottom Line: As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures.Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations.When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

View Article: PubMed Central - PubMed

Affiliation: SWITCH Laboratory, Vrije Universiteit Brussels, Brussels, Belgium.

ABSTRACT
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.

Show MeSH