Limits...
Piecewise linear approximation of protein structures using the principle of minimum message length.

Konagurthu AS, Allison L, Stuckey PJ, Lesk AM - Bioinformatics (2011)

Bottom Line: Relying solely on standard secondary structure may result in a significant loss of structural information.Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure.Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them.

View Article: PubMed Central - PubMed

Affiliation: Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia. arun.konagurthu@monash.edu

ABSTRACT

Unlabelled: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure.

Availability: http://www.csse.monash.edu.au/~karun/pmml.

Show MeSH
Wall-eye stereo image of 1.8 Å crystal structure of oxidized Clostridium beijerinckii flavodoxin. Each delineated segment produced by PMML is shown in a different color. The elements of secondary structures, of helices and strands of sheet, were derived from the wwPDB file, 5NLL, and are shown in this figure as thick ribbons. The labels of various secondary structures are also shown. The bound FMN co-factor is shown at the top of the structure as thin lines.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117365&req=5

Figure 3: Wall-eye stereo image of 1.8 Å crystal structure of oxidized Clostridium beijerinckii flavodoxin. Each delineated segment produced by PMML is shown in a different color. The elements of secondary structures, of helices and strands of sheet, were derived from the wwPDB file, 5NLL, and are shown in this figure as thick ribbons. The labels of various secondary structures are also shown. The bound FMN co-factor is shown at the top of the structure as thin lines.

Mentions: Manually evaluating the delineation of a large number of structures we notice that although PMML's delineation identifies the regions of helix and strand consistently, there remain small discrepancies in assigning precise beginning and end residues of secondary structure elements as ascertained by an expert. To highlight these differences consider the following example of the delineation produced by PMML. Figure 3 shows the structure of oxidized Clostridium beijerinckii flavodoxin. This protein binds a cofactor, flavin mononucleotide (FMN). Flavodoxin is a small α/β protein, containing a 5-stranded parallel β-sheet (β1,…,β5), with two helices packed against each face of the sheet (αA,αE and αC,αD). There is also a short helix (αB) located near the N-terminus of the protein. (Fig. 3.) Different segments produced by PMML are shown in different colors. The elements of secondary structure shown as thick ribbons are the secondary structure assignments taken from the structure's wwPDB file (5NLL). Table 3 gives the residue ranges (that is, start and end residues) for each secondary structural element (SSE) of the flavodoxin structure listed in its wwPDB file. The residue ranges of the corresponding segmentation produced by PMML is also presented in the table. Broadly, the program correctly assigns segments to the SSEs. However, minor differences can be observed in the locations of their start and end residues. In most cases, we notice an absolute difference of 1 or 2 residues in the N- or C- terminal regions of these SSEs. The segmentation in the regions around the SSEs αE, β2 and β5 show some discrepancies. The residue range from wwPDB corresponding to αE was approximated by PMML using 2 segments instead of one. The first segment is composed of roughly one turn of the helix at αE's N-terminal end. This is understandable as this turn is substantially skewed from the main helical axis and, indeed, there is an interruption in the hydrogen bonding. However, the second segment composed of 11 residues in this region is consistent with the assignment in the wwPDB file. In the case of β2, the start location identified by PMML precedes the start location identfied in the wwPDB file by four residues. On inspecting the flavodoxin structure, there appears to be a backbone hydrogen bond between the carbonyl group of residue Asp29 and the nitrogen of Met1 (of strand β1), so the β2 strand may well start at residue Lys28 or Asp29. Similarly, for β5, the start location of the segment from PMML was identified to be three residues before the location identified in the wwPDB file, and inspecting the structure, we note the β−bulge in strand β5, and hydrogen bonds between atoms 80O···109N and 82N···109O; assignment of the start of the strand β5 to residue 109 is not indefensible.Fig. 3.


Piecewise linear approximation of protein structures using the principle of minimum message length.

Konagurthu AS, Allison L, Stuckey PJ, Lesk AM - Bioinformatics (2011)

Wall-eye stereo image of 1.8 Å crystal structure of oxidized Clostridium beijerinckii flavodoxin. Each delineated segment produced by PMML is shown in a different color. The elements of secondary structures, of helices and strands of sheet, were derived from the wwPDB file, 5NLL, and are shown in this figure as thick ribbons. The labels of various secondary structures are also shown. The bound FMN co-factor is shown at the top of the structure as thin lines.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117365&req=5

Figure 3: Wall-eye stereo image of 1.8 Å crystal structure of oxidized Clostridium beijerinckii flavodoxin. Each delineated segment produced by PMML is shown in a different color. The elements of secondary structures, of helices and strands of sheet, were derived from the wwPDB file, 5NLL, and are shown in this figure as thick ribbons. The labels of various secondary structures are also shown. The bound FMN co-factor is shown at the top of the structure as thin lines.
Mentions: Manually evaluating the delineation of a large number of structures we notice that although PMML's delineation identifies the regions of helix and strand consistently, there remain small discrepancies in assigning precise beginning and end residues of secondary structure elements as ascertained by an expert. To highlight these differences consider the following example of the delineation produced by PMML. Figure 3 shows the structure of oxidized Clostridium beijerinckii flavodoxin. This protein binds a cofactor, flavin mononucleotide (FMN). Flavodoxin is a small α/β protein, containing a 5-stranded parallel β-sheet (β1,…,β5), with two helices packed against each face of the sheet (αA,αE and αC,αD). There is also a short helix (αB) located near the N-terminus of the protein. (Fig. 3.) Different segments produced by PMML are shown in different colors. The elements of secondary structure shown as thick ribbons are the secondary structure assignments taken from the structure's wwPDB file (5NLL). Table 3 gives the residue ranges (that is, start and end residues) for each secondary structural element (SSE) of the flavodoxin structure listed in its wwPDB file. The residue ranges of the corresponding segmentation produced by PMML is also presented in the table. Broadly, the program correctly assigns segments to the SSEs. However, minor differences can be observed in the locations of their start and end residues. In most cases, we notice an absolute difference of 1 or 2 residues in the N- or C- terminal regions of these SSEs. The segmentation in the regions around the SSEs αE, β2 and β5 show some discrepancies. The residue range from wwPDB corresponding to αE was approximated by PMML using 2 segments instead of one. The first segment is composed of roughly one turn of the helix at αE's N-terminal end. This is understandable as this turn is substantially skewed from the main helical axis and, indeed, there is an interruption in the hydrogen bonding. However, the second segment composed of 11 residues in this region is consistent with the assignment in the wwPDB file. In the case of β2, the start location identified by PMML precedes the start location identfied in the wwPDB file by four residues. On inspecting the flavodoxin structure, there appears to be a backbone hydrogen bond between the carbonyl group of residue Asp29 and the nitrogen of Met1 (of strand β1), so the β2 strand may well start at residue Lys28 or Asp29. Similarly, for β5, the start location of the segment from PMML was identified to be three residues before the location identified in the wwPDB file, and inspecting the structure, we note the β−bulge in strand β5, and hydrogen bonds between atoms 80O···109N and 82N···109O; assignment of the start of the strand β5 to residue 109 is not indefensible.Fig. 3.

Bottom Line: Relying solely on standard secondary structure may result in a significant loss of structural information.Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure.Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them.

View Article: PubMed Central - PubMed

Affiliation: Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia. arun.konagurthu@monash.edu

ABSTRACT

Unlabelled: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure.

Availability: http://www.csse.monash.edu.au/~karun/pmml.

Show MeSH